Learning to hybrid search
2023-06-20 , Palais Atelier

Combining BM25, neural embeddings and customer behavior with Learning-to-Rank into an ultimate ranking ensemble, with examples on Amazon ESCI e-commerce search dataset.


Traditional term search has good precision but lacks semantics. Modern neural search is good at semantics but can miss customer behavior. Learning-to-rank approach adapts to customer behavior, but only if your baseline retrieval is already good enough.

The current hype about neural search can make an impression that it's the ultimate solution for all problems of legacy term search and LTR. You just only need [disclaimer: irony ahead] to do a very simple thing of fine-tuning a giant neural network to notice all the dependencies between queries, documents and customer behavior on all the data you have. But what if instead of replacing A with B, you can combine the strengths of all the approaches?

In this talk, we will take an example of an e-commerce search with an open-source Amazon's ESCI/ESCI-S dataset and compare traditional text matching and Learning-to-Rank approaches with modern neural search methods on real data. We will show how combining multiple old, and new approaches in a single hybrid system can deliver an even better result than each of them separately.

See also: Slides (6.6 MB)

Principal Engineer at Delivery Hero SE, working on search personalization and recommendations. A pragmatic fan of functional programming, learn-to-rank models and performance engineering.

Software engineer in the past, switched tracks to work closer with customers and product. Has multi-year experience of communicating with customers to understand what they really want and translating this information to engineers as a Head of Product.