Deep dive into an Elasticsearch plugin for query-time joins
2023-06-20 , Palais Atelier

Siren Federate is an Elasticsearch plugin for joining inverted indices at query-time. Learn in this talk about its inner workings and how it complements features of Elasticsearch like runtime fields.


Data are often at the basis of critical decisions in many different sectors, ranging from e-commerce to cyber-security: machine and search logs, user information, metrics over transactions, etc. Data in such domains are by nature voluminous and inter-connected. Analytics systems are expected not only to search and aggregate those data, but also to join them: join operations are often necessary in order to explore inter-connected data and get insights from them. Analysts often interact with such systems by following an explorative and iterative process that represents their train of thoughts. Such systems then must have fast response times to avoid impeding the mental process of the analysts. Whilst Elasticsearch is a fantastic high performance analytics engine, it presents some limitations in certain cases when it comes to joining data from different indices at query-time.

In this talk, we will present Siren’s ten years-long effort in implementing distributed joins on top of Elasticsearch. We will introduce Siren Federate – our Elasticsearch plugin that provides query-time join capabilities over indices – and we will discuss some of the challenges we had to tackle during its development. We will begin by describing how joins are performed by Federate, from the reception of a query till the computation of its results. Then we will show the importance of caching join results for performance, and how a cache can be efficiently implemented. Talking about performance, we will explain the benefits of adopting a vectorized data processing model by showing some experimental results. To conclude, we will discuss the importance of the expressiveness of a query language by illustrating the Federate DSL and how it integrates with some advanced features of Elasticsearch such as runtime fields.

See also: Slides (2.0 MB)

Stéphane completed his Ph.D. studies at the University of Galway (Ireland) working on an Information Retrieval engine for Linked Data and became deeply interested in that field. He then transitioned to working for Siren, a spin-off of that research endeavor. From that point on, he worked on Federate, an Elasticsearch plugin for computing joins between inverted indices, and was responsible for maintaining and developing various parts, e.g., from the query planner to the interactions with Lucene like with the query cache.