2024-06-10 –, Frannz Salon
From relevance guesswork to granular search experimentation at scale. Evaluate modifications to your search, such as incorporating synonyms, adjusting field-boosts, adding vector search, or product assortment modifications, on real data rather than relying solely on intuition.
Measuring the effectiveness of changes in search functionality, whether significant or minor, remains a persistent challenge. Even experienced teams often rely on search relevance feedback labels to gauge outcomes. While Web-scale Experimentation, such as A/B or Multivariate Testing, is widely practiced, only a few companies utilize it to enhance their search systems incrementally on a query-level basis.
A/B Testing in the realm of search presents unique complexities. It involves various considerations and lacks a universal approach. Challenges include selecting appropriate metrics for evaluation, determining suitable randomization units, and addressing imbalanced and sparse experimental data to establish precise decision boundaries, especially with limited data availability.
Over time, we've developed, refined, and operated our "Query Testing" capability. I aim to share insights gleaned from this journey.
Andreas Wagner is a serial entrepreneur who is passionate about teaching software to comprehend human behavior. With more than 15 years of extensive experience in the retail and eCommerce industries, he has held various positions in different companies. Currently, he is serving as the CTO and strategist at searchHub.io, where he continues to pursue his mission of improving software's ability to understand human needs.