2024-06-11 –, Maschinenhaus
We present the lessons learnt from improving the search of our global online marketplace with 20 million products sold per year. We successfully moved from a traditional word-match based approach (BM25) to a modern hybrid solution by adding a semantic vector model which we fine-tuned to our domain.
With numerous references to current literature, we will explain how we designed our new system and solved the multiple challenges we encountered on both the ML and engineering side (data pipeline encoding documents, live service encoding queries, integration with search engine) as well as sharing insights from analyzing the impact. Our system is based on OpenSearch, the lessons can be applied to other search engines as well.
To be more specific, the presentation will cover:
- Status and Short-Comings of our old Search
- Introduction of Hybrid Search
- general setup
- recommendations from literature
- Machine Learning
- model decision (quality vs. latency)
- fine-tuning and offline evaluation (in particular: using Paid Search / SEM data if you have few historic own search performance data)
- Architecture and Implementation: (with special consideration of latency)
- pipeline for encoding documents and indexing the resulting vectors (PySpark)
- service for live-encoding of queries (Python)
- implementing hybrid search within OpenSearch (including important filter value extraction from query and ranking scores)
- Learnings and Next Steps:
- observations from our A-B test
- challenge of cut-off decisions
- realistic training / evaluation data
- filter value extraction from query vs. semantic search
- combining search with auto-complete
- impact of the call to action in the search bar
- for which other use cases we successfully apply such semantic vector approaches
Ansgar Gruene, Ph. D., is a Senior Data Scientist at GetYourGuide in Berlin. His work focuses on ML approaches to improve the users search and discovery experience on the platform. He holds a Ph.D. in Theoretical Computer Science and has several years of experience as Backend Engineer and Data Scientist in the travel industry.
Hello,
I am a senior engineer working in Search team at Getyourguide. I am responsible for all the infrastructure and data processing for search, which is exposed via generic APIs. I also have deep interest in performance and databases in general, and i have past experience in contributing to Opensearch. I enjoy reading technical white papers, as well as reading more about the current AI hot-trends in general.
I'm a senior MLOps engineer at GetYourGuide.