Berlin Buzzwords 2026

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
09:30
09:30
5min
Opening Session
Paul Berschick

Join us as we kick off Berlin Buzzwords 2026!

Kesselhaus
09:35
09:35
45min
Building Resilience: The Next Decade of Open Source
Ruth Suehle

Over 25 years, open source has become vital digital infrastructure. However, its future relies on human resilience, not just code. To combat burnout, funding gaps, and new regulations, we must move beyond old methods and address sustainability through global policy, security, and community health.

Kesselhaus
10:20
10:20
20min
Breakfast Break
Kesselhaus
10:20
20min
Breakfast Break
Maschinenhaus
10:20
20min
Breakfast Break
Palais Atelier
10:20
20min
Breakfast Break
Frannz Salon
10:40
10:40
20min
Low-Resource Languages as Stress Tests for NLP Data
Priscilla Lola Adenuga

Low-resource languages expose weaknesses in NLP systems that are often hidden by benchmark data. Drawing on experience annotating fieldwork data, this talk shows how ambiguity and annotation decisions reveal fundamental data quality issues relevant to real-world NLP pipelines.

Kesselhaus
10:40
20min
OpenSearch Software Foundation: 1 Year of Open Governance
Kris Freedain, Carlos Rolo

In this presentation, we will talk through moving a major open source project into a foundation and the benefits of open governance, and a vendor-neutral home has proven through a sustained growth in community contributions.

Maschinenhaus
10:40
20min
Towards Chunk-less RAG
Carles Onielfa

Retrieval-Augmented Generation (RAG) systems rely on pre-chunked documents, tying retrieval to arbitrary boundaries. This talk explores an experimental approach that surfaces semantically relevant text spans, without chunking. We'll share surprising findings and examine whether this technique points toward a viable chunk-free retrieval paradigm.

Frannz Salon
11:10
11:10
40min
Apache Solr 10: What's Coming up for Vector Search
Alessandro Benedetti, Ilaria Petreti, Anna Ruggero

With Apache Solr 10 out, there are plenty of goodies coming up for vector-search aficionados.
From scalar and binary quantization to speed up your search and reduce the memory footprint, to early termination and hybrid approaches to navigate the HNSW graph.
Join us if you want to learn about the big steps forward of Apache Solr vector search!

Maschinenhaus
11:10
40min
Dynamic Broker-Side Filtering for Kafka
David Kjerrumgaard, Álvaro Rodríguez

KAFKA-6020 has been open for 7 years. This talk demos broker-side filtering for Kafka with sub-millisecond latency (p99 < 25ms). Live demo with working code shows how it reduces network costs, simplifies consumers, and enables new use cases. Real-world validation from financial services and logistics deployments.

Kesselhaus
11:10
40min
Mentoring In Open Source in the Age of AI
Tilda Udufo, Busayo Ojo

Open source mentorship changed overnight with AI tools. Contributors submitted polished code they couldn’t explain, making learning harder to assess. This talk shares what we learned mentoring Outreachy contributors—what failed, what worked, and what we’re still figuring out.

Palais Atelier
11:10
40min
No 0-day required, just target the AI coding assistant!
Leo Visser

Discover how attackers can manipulate AI coding assistants through hidden text, typosquatting and code errors. Learn to detect concealed instructions and set up trusted dependencies to keep unsafe code out of your environment.

Frannz Salon
12:00
12:00
40min
Beyond the Hype: When Apache Flink Solves Real Problems
Naci Simsek

When does Apache Flink solve real problems versus add complexity? Explore use cases where Flink becomes essential such as fraud detection, CDC, real-time analytics versus when batch or Kafka Streams suffice. Compare stream engines (Flink, Spark) with platforms (Kafka, Pulsar) to confidently decide when streaming delivers value.

Palais Atelier
12:00
40min
Constant-Time Aggregations with Star-Tree in OpenSearch
Sandesh Kumar, Shailesh Kumar Singh

Discover how OpenSearch breaks linear scaling. Inspired by Apache Pinot, the Star-Tree index moves performance dependency from document count to field cardinality. Learn how we extended Lucene’s DocValues to build multi-dimensional materialized views that deliver sub-second analytics on billion-scale datasets for observability workloads.

Maschinenhaus
12:00
40min
OSS Security: Lessons from 10+ Years at Apache Solr
Jason Gerlowski

How are security decisions big and small made in a distributed open source community? Come find out at this session where users will gain insights and examples (both good and bad) to take back to their own projects.

Frannz Salon
12:00
45min
The Agent Era: How AI Agents Are Reshaping Data Platforms
Monica Sarbu, Danica Fine, Philipp Krenn

AI agents have quietly become some of the most demanding users of modern data platforms and most weren't built with them in mind. In this panel, leaders from Snowflake, Elastic, ClickHouse, and Xata share what agentic workloads actually look like in production: what broke, what had to be rebuilt, and where the architecture is heading.

Kesselhaus
12:40
12:40
80min
Lunch Break
Maschinenhaus
12:40
80min
Lunch Break
Palais Atelier
12:40
80min
Lunch Break
Frannz Salon
12:45
12:45
75min
Lunch Break
Kesselhaus
14:00
14:00
40min
Agentic Retrieval: Building Self-Optimizing Search Systems
Skip Everling, Jo Kristian Bergum

Relevance feedback loops used to take months. AI agents can now compress the process to seconds. This talk explores agentic retrieval: systems where agents adjust scoring models, schema, and indexing in real time. Learn how to build retrieval infrastructure with verifiable APIs that enable agents to optimize their own search context.

Kesselhaus
14:00
40min
Apache Spark Declarative Pipelines in Action
Frank Munz

Learn Spark 4.1's brand-new Declarative Pipelines, a paradigm shift replacing imperative code with simple declarations. We'll build a real-time data pipeline together, processing streaming ADS-B flight data from tens of thousands of aircraft overhead.

Palais Atelier
14:00
40min
Livecoding Data Visualisations with Streamlit
Kris Jenkins

Streamlit is an open source Python library that lets you present data to people, without having to become a frontend developer. It's easy to learn, fast to build with, and should be in every data-wrangler's toolkit. In this talk you'll learn everything you need to know to get started.

Frannz Salon
14:00
40min
Turning the database inside out again
Tom Scott, Roman Kolesnev

We rethink data systems by putting streams at the center. Expanding on Martin Kleppmann's: Turning the Database Inside Out, this talk shows how Apache Kafka and Apache Iceberg together provide durable storage, indexing, and rich views that eliminate brittle ETL and unify real-time and historical analysis. A new way to see databases—and streams.

Maschinenhaus
14:50
14:50
20min
From OLTP to OLAP: Is PostgreSQL Eating Analytics Too?
Daniel Seybold

Can PostgreSQL become a serious analytics engine? With emerging columnar extensions, PostgreSQL is pushing beyond OLTP into OLAP territory. This talk explores the current columnar landscape, architectural trade-offs, and how far PostgreSQL can go compared to analytical engines like ClickHouse.

Maschinenhaus
14:50
20min
Scientific Data Under Threat in Today’s America
Uwe Schindler

A brief look at how scientific data came under political pressure during the presidency of Donald Trump, and how scientists and data repositories in Europe worked to protect public access to evidence-based research, ensuring access for data science in a well-structured way.

Frannz Salon
14:50
20min
Ultraviolet: Turn Hidden Document Data into an AI Advantage
Alessio Vertemati

Every PDF hides a world of structure, metadata and embedded signals that can silently influence AI based processing. With ultraviolets, we reveal how those can be exploited for malicious purposes and even become powerful tools for smarter applications. Designing for both humans and machines become a vital aspect of AI experience design.

Kesselhaus
15:20
15:20
40min
How Apache Iceberg Enables Multi-Engine Data Platforms
Geetha Anne

the session will cover operational best practices, including metadata management, file sizing, compaction strategies, and performance tuning at scale. Attendees will leave with practical guidance for designing &operating open, flexible, multi-engine data architectures built on Apache Iceberg, enabling faster analytics, lower operational flexibility

Kesselhaus
15:20
40min
Let LLMs Wander: Engineering RL Environments
Stefano Fiorucci

What if, instead of learning only from examples, Language Models could explore crafted Environments, little worlds where they can act and improve autonomously?

Join me to see how Reinforcement Learning Environments work, how to build them with open-source tools, and how to use them to evaluate and train LLMs/Agents.

Frannz Salon
15:20
40min
Streamling: Lightweight, Extensible Streaming on DataFusion
Xiao Meng, Rafael Aguiar

Apache DataFusion is moving beyond batch into streaming. We built Streamling, a Rust streaming engine that uses DataFusion planning and Arrow RecordBatch streams for real-time SQL/WASM transforms. This talk covers how we built it, highlights key features (FFI plugins, WASM transforms, and dynamic tables), and shares production lessons.

Maschinenhaus
15:20
40min
Why Choose One: Multi-Engine Analytics with Apache Wayang
Zoi Kaoudi, Haralampos Gavriilidis

Choosing the best engine for each data task sounds right, but in modern data stacks doing so requires expertise and effort. Apache Wayang, a recently graduated TLP, addresses this by decoupling logical dataflows from execution engines. From big data platforms to SQL and ML engines, Wayang enables cross-platform execution that maximizes performance.

Palais Atelier
16:00
16:00
30min
Coffee Break
Kesselhaus
16:00
30min
Coffee Break
Maschinenhaus
16:00
30min
Coffee Break
Palais Atelier
16:00
30min
Coffee Break
Frannz Salon
16:30
16:30
40min
10x CouchDB Performance Gains for a AAA Game Launch
Jan Lehnardt

All software benchmarks and claims of performance are carefully crafted lies and this talk is no different. Instead of giving you a quick “do steps one, two, three for a magic speedup”, we aim to explain how we arrived at the changes we made and how we rigorously tested those changes to make sure we understand their impact.

Kesselhaus
16:30
40min
Event-driven Agents with Complex Event Processing in Flink
Steffen Hoellinger

Event-driven Agents calling LLMs can be combined with Pattern Recognition and Anomaly Detection in Apache Flink in smart ways to increase cost efficiency, avoid hallucinations and enforce predictable, deterministic behavior. Specifically in a business process context, this architecture provides opportunities for continuous real-time process mining.

Palais Atelier
16:30
40min
SPRUCE it up! Open Source GreenOps at scale
Julien Nioche

GreenOps adoption is stalled by missing data from cloud providers. SPRUCE is an open-source, scalable platform built on Apache Spark that enriches cloud usage reports with open models to quantify carbon impact, build insights, and help teams reduce both emissions and cloud spend.

Frannz Salon
16:30
40min
Search is Back: Solving the "Context Crisis" for AI Agents
David Louis Hollembaek, Vincent Pistor

Why do smart agents make dumb mistakes? The culprit is context, an old problem with new solutions. Let's fast-forward through 20 years of search evolution to fix the missing link in today’s Agentic AI.
We’ll demonstrate how to combine Knowledge Graphs and Vector Search to build reliable, context-aware applications using open-source tools.

Maschinenhaus
17:20
17:20
45min
AI is here – time to throw away our search engines?
Charlie Hull, Atita Arora, Jo Kristian Bergum, Dmitry Kan, Evgeniya Sukhodolskaya

Why do we even need traditional search when AI can do everything? Or is it foolish to ignore simple, proven techniques for delivering great results? What's the best way to combine old and new? Join our panel of experts for a fun and provocative debate!

Kesselhaus
17:20
40min
Building a Local News RAG: The Quest for Trustworthiness
Marcel Dokters

We will show you how we build a local newspaper rag and all the problems that came along the way. From trustworthiness to customer wishes, search optimization and generation problems. Local villages, that LLMs know nothing about, content that is semantically the same and outdated information are only a part of the journey we made.

Maschinenhaus
17:20
40min
Floe: Policy-Based Table Maintenance for Apache Iceberg
Neelesh Salian

Iceberg maintenance procedures work. Orchestrating them across hundreds of tables is the problem. Floe is an open-source system that treats maintenance as policy: glob patterns, schedules, and health-driven triggers that gate operations on real table metrics. Supports 7 catalogs, executes via Spark or Trino.

Palais Atelier
17:20
40min
Observability’s Sixth Sense: Detecting Anomalies in Metrics
Diana Todea

In this talk, we look at anomaly detection as a complementary way of working with metrics. Instead of relying on predefined limits, anomaly detection focuses on identifying behavior that deviates from what is normally observed over time. The focus is on how developers can interpret these signals, where anomaly detection is useful, where it is not.

Frannz Salon
18:05
18:05
180min
Get-Together

Join us for food and drinks at Palais Kulturbrauerei!

Kesselhaus
09:30
09:30
20min
Circular Dependency Fixes when Bootstrapping a Golden Set
Radu Gheorghe, Rafał Kuć

For a golden set, you need queries. Even if you have them, you can’t judge all docs for each query. Only the top N. How do we rank the top N? See the circular dependency? We’ll talk about ways to untangle it: lexical search, significant terms, training an embedder from scratch, etc. By iteratively refining data and queries, we'll get there.

Maschinenhaus
09:30
20min
Reviving phonetic algorithms for better search relevance
Pietro Mele, Radu Pop

Fuzzy search is a double-edged sword: it fixes typos but drowns users in noise on large corpora. At INA, we revived ancient phonetic algorithms to improve relevance. This session compares fuzzy vs. phonetic search on a massive archive, showing how "sounding right" beats "spelling close."

Kesselhaus
09:30
20min
Writes, 3 ways: Postgres, Apache Kafka® and Apache Iceberg™
Celeste Horgan

Learning new things is hard, but a useful way to think about new things is by comparing them to things you already know. In this talk, we'll compare writes between 3 different popular data services: Postgres, Apache Kafka and Apache Iceberg. In doing so, we'll learn a bit about the evolution of how we've thought of data storage as developers.

Frannz Salon
10:00
10:00
40min
From Inverted Index to Columnar Vectorized Execution Search
Saurabh Singh, Ankit Jain

Search engines are converging with analytical data systems. This talk explores how columnar data layouts, SIMD-accelerated execution, and bulk-oriented processing are reshaping search internals. We examine where traditional models fall short and how hardware-aware techniques from analytics engines are defining the next search infrastructure.

Kesselhaus
10:00
40min
GitOps for n8n: Treating Workflows as Code
Joao Gilberto Magalhaes

n8n-gitops is an open-source CLI that applies GitOps principles to n8n workflows. This talk shows how workflows can be exported, reviewed, versioned, and deployed from Git instead of manually promoted via the UI. Through a live demo, we explore safer deployments, rollbacks, and lessons learned operating automation as code.

Frannz Salon
10:00
40min
Kafi Streams: Complex Stream Processing Made Simple
Ralph Matthias Debusmann

You can finally stop caring about co-partitioning, state stores and eventual consistency. Kafi Streams, built on (Py)DBSP, treats streaming like batch — strongly consistent, no special concepts. An Open Source Python library for the 80% of use cases that don't need extreme scale. Fully incremental stream processing for everyone, from day one.

Palais Atelier
10:00
40min
Text-to-Struct: Fine-tuning SLMs for Query Intent
Hugo Jimenez, Sandra Bullón

Hybrid search fails on complex intent: vector search misses constraints, keywords miss nuance. This talk explores fine-tuning SLMs for 'Query Understanding'—transforming vague inputs into structured requests. Learn to extract metadata, expand terms, and route intent to build a search engine that does the hard work for your users.

Maschinenhaus
10:40
10:40
30min
Breakfast Break
Kesselhaus
10:40
30min
Breakfast Break
Maschinenhaus
10:40
30min
Breakfast Break
Palais Atelier
10:40
30min
Breakfast Break
Frannz Salon
11:10
11:10
40min
Context-Aware Segments: Solving the "Scatter-Read" Problem
Rishav Sagar, Tejas Shah

Traditional OpenSearch segments are context-blind, scattering data across multiple segments. We introduce Context-Aware Segments (CAS), an architecture that brings "sharding" logic to the segment level. By enforcing document locality during indexing, we slashed query latency and minimized data footprint through superior pruning and compression.

Maschinenhaus
11:10
40min
DuckDB beyond the notebook
Matthias Niehoff

Most people know DuckDB as a fast analytics tool for notebooks and scripts. But embedded OLAP enables much more: browser-based analytics via WebAssembly, serverless data processing, and lightweight data apps — without heavy infrastructure. This talk shows how DuckDB changes the way we build data-driven applications.

Palais Atelier
11:10
40min
Real-Time ML Pipelines: Feature Chaining with Chronon
Varant Zanoyan

Modern ML applications demand features computed in near real-time with sub-100ms latencies. This talk dives into Chronon, an OSS feature platform bridging streaming data infrastructure and production ML. Using a two-tower search pipeline example, we'll show how we can chain embeddings with tabular features while minimizing hot-path computation.

Frannz Salon
11:10
40min
When better retrieval makes agents worse
Lester Solbakken

Agentic systems can break not because information is missing, but because persuasively wrong context gets promoted into action. We examine a recurring pattern: retrieval metrics improve while agent behavior degrades as distractors enter multi-step loops. We show why relevance, reliability, and security are tightly connected in agentic retrieval.

Kesselhaus
12:00
12:00
40min
C++ Search for Database Kernels: Built In, Not Bolted On
Andrey Abramov

IResearch is an Apache 2.0 C++ search engine built to live inside databases. We'll benchmark it against leading open-source search engines, show why vectorized scoring is the next frontier for information retrieval engines, share the mistakes we made over a decade of development and explore how database-native search fits modern query execution.

Maschinenhaus
12:00
40min
Keeping data private in real-time pipelines
Olena Kutsenko

Real-time data is awesome… until you realize it’s leaking names, emails, and locations. In this talk, you’ll learn how to keep streaming data private, from simple masking to tricks that beat re-identification. All with live demos and some juicy real-world stories.

Kesselhaus
12:00
40min
OTel + Apache Iceberg: The New Standard for Observability
Yingjun Wu

Observability is moving from vendor stacks to open standards. This talk presents a design where OpenTelemetry provides collection and semantic context, and Apache Iceberg is the data layer for logs, metrics, and traces. We cover portability, governance, agent investigation, and write-path pitfalls: drift, small files, compaction.

Palais Atelier
12:00
40min
The Failures That Don't Crash: MLOps for AI Agents
Bartosz Mikulski

This talk takes four reliability patterns from distributed systems and shows what they look like inside an agent architecture. How to shadow-test an agent. Why your circuit breakers need confidence thresholds. What an eval harness looks like when your system is non-deterministic. And why human oversight degrades faster than anyone admits.

Frannz Salon
12:40
12:40
80min
Lunch Break
Kesselhaus
12:40
80min
Lunch Break
Maschinenhaus
12:40
80min
Lunch Break
Palais Atelier
12:40
80min
Lunch Break
Frannz Salon
14:00
14:00
40min
How to Tell If Your Agent Used the Right Stuff
Apurva Misra

Many so-called “agent failures” are actually context failures in disguise. In this session, we’ll explore how to tell whether your agent truly saw and used the right context, using techniques like tracing and attribution, golden datasets for context-aware evaluation, and targeted probes to test retrieval quality.

Frannz Salon
14:00
40min
One GPU, Four Retrieval Modes: Multi-Model Search Serving
Filip Makraduli

Competitive search now needs dense embeddings, sparse vectors, ColBERT, and cross-encoder reranking. Most teams run four separate containers. This talk shows how to serve all four from one process, walks through building a hybrid retrieval pipeline with real benchmark data, and covers where each retrieval mode wins and where it wastes compute.

Maschinenhaus
14:00
40min
The Three-Body Problem of Inverse Hybrid Search
Ravindra Harige

When users expect alerts for new products matching an uploaded image, the problem becomes inverse hybrid search. Unlike top-K search, alerting must guarantee fetch-all semantics: zero missed matches across all saved searches, combining vector similarity, boolean filters, and lexical signals. We show why this breaks traditional scaling intuition.

Kesselhaus
14:00
40min
What If We've Been Scaling Stream Processing Wrong All Along
Hartmut Armbruster

We’ve normalised extraordinary inefficiency in stream processing. Thousands of events/sec don't justify repartition storms, serialization overhead, state migration. This talk explores a different path: Kafka Streams DSL, adopt Flink-like exactly-once semantics, Project Loom, and challenging the assumption that stream processing must be distributed.

Palais Atelier
14:50
14:50
40min
Beyond Grep: Search for Reliable Coding Agents
Amine GANI, Roudy Khoury

Coding agents succeed in verifiable loops (compiler + tests), but large repos still expose retrieval weaknesses.
This session explores how lexical, structural, and semantic search can provide cleaner context for LLMs. We compare tradeoffs and evaluation approaches to improve reliability without inflating token cost.

Kesselhaus
14:50
40min
Detecting Hidden Bias in Datasets Before Models Fail
Stas Don

Hidden bias in datasets silently breaks machine learning systems in production. This talk shows how to detect data imbalance, leakage, and coverage gaps early using practical metrics, visualizations, and open-source tools—before misleading offline metrics turn into costly real-world failures.

Palais Atelier
14:50
40min
Sunset for the Wild West: Making ML disciplined by default
William Benton

Many novel machine learning techniques started as clever hacks that just happened to work, but the demands of building real systems can be at odds with this creative culture. Learn about our open-source stack to improve quality-of-life for ML researchers and infrastructure teams alike — and how their concerns aren't as different as you might think.

Frannz Salon
14:50
40min
Zero downtime index upgrade in Apache Solr
Rahul Goswami

In this talk we'll explore how Apache Solr introduced the capability to upgrade an index in-place with zero downtime. This upgrade path helps prepare the index for a future Solr major version upgrade without needing to recreate the index from source as is the case with Lucene based search engines today.

Maschinenhaus
15:30
15:30
30min
Coffee Break
Kesselhaus
15:30
30min
Coffee Break
Maschinenhaus
15:30
30min
Coffee Break
Palais Atelier
15:30
30min
Coffee Break
Frannz Salon
16:00
16:00
40min
Building Schema-Free Applications with RDF
Gosia Zagajewska, Mateusz Charytoniuk

RDF was designed for the semantic web, but it turns out to be a perfect fit for systems where structure emerges from user interaction, not upfront design. This talk covers how to build applications entirely on RDF triples, translate natural language to SPARQL with small, open source language models, and discover implicit knowledge in user input.

Maschinenhaus
16:00
40min
Correctness Too Cheap To Meter: Formal Verification and LLMs
Emilie Ma

Formal methods are powerful tools to verify software systems' correctness and reliability. However, manually writing system specs is time-consuming and hard to maintain. LLMs can help with this burden.
We'll share new research into tools to automate formal methods workflows and learnings from how LLMs currently perform.

Kesselhaus
16:00
40min
Escaping the Cloud: High-Performance AI in your Browser
Johannes Kolbe

Server-side inference is the bottleneck of modern AI, creating costs and privacy hurdles. But what if the solution is scaling down to the browser? This session investigates Client-Side AI using WebGPU, ONNX Runtime, and Transformers.js. We’ll explore the reality of hardware access, model size, and the 2026 trade-offs of browser based execution.

Frannz Salon
16:00
40min
What you should know about constraints in PostgreSQL 18
Gülçin Yıldırım Jelinek

This talk explains how constraints work in Postgres by exploring the pg_constraint catalog and core concepts like table vs. column constraints, constraint triggers, domains and constraint deferrability through SQL queries. It then covers what’s new in Postgres 18 including temporal keys, NOT NULL as a first-class constraint, NOT ENFORCED and more.

Palais Atelier
16:50
16:50
20min
AI in the physical world: from observation to discovery
Dmitriy Kostunin, Julian von Hoerschelmann-Schliwinski

In 2026, AI is moving beyond digital tasks into the physical world. It increasingly interacts with instruments, experiments, and real-world data. Physicists stand at this frontier, using deep learning, LLMs, and agents to analyze nature itself. What have we learned about AI when it meets reality?

Palais Atelier
16:50
20min
From Legacy Search to Vespa: What a Real PoC Taught Us
André Charton, Valeriia Platonova

For years, Germany’s largest classifieds website relied on a search-first relevance approach because structured data was sparse. This talk shares how we introduced Vespa in the Motors category, enriched signals with embeddings and extracted attributes, and migrated step by step; what worked, what failed, and which lessons only a real PoC reveals.

Kesselhaus
16:50
20min
How to Survive the Vortex of LLM Change
Carmen Iniesta, Carles Onielfa

The LLM ecosystem changes faster than most teams can adapt. This talk shares our experience and the practical lessons we’ve learned while building an intelligent search product in a world where models, tools, and best practices constantly evolve.

Maschinenhaus
17:10
17:10
10min
Closing Session
Paul Berschick

Join us as we wrap up Berlin Buzzwords.

Kesselhaus