Reviving phonetic algorithms for better search relevance Berlin Buzzwords 2026

Reviving phonetic algorithms for better search relevance
.ical
2026-06-09 09:30–09:50, Kesselhaus

Fuzzy search is a double-edged sword: it fixes typos but drowns users in noise on large corpora. At INA, we revived ancient phonetic algorithms to improve relevance. This session compares fuzzy vs. phonetic search on a massive archive, showing how "sounding right" beats "spelling close."

When users are unsure of a spelling, fuzzy search is the standard engineering solution. However, at the scale of the French National Audio-visual Institute, we found that standard fuzziness hits a wall. On a massive corpus, "approximate" matching retrieves a paralyzing amount of noise, degrading the user experience.

To solve this, we looked back to move forward. We revived and re-implemented "ancient" phonetic algorithms, some dating back decades, to test if matching by sound could outperform matching by character distance.

In this talk, we share our journey in tuning relevance for the French language, which is notoriously difficult due to silent letters and homophones. We will cover:

The Fuzziness Trap: Why increasing edit distance failed to solve our precision/recall trade-off.
Algorithm Showdown: A comparative analysis of standard Fuzzy Querying vs. Phonetic Analysis (e.g., Soundex, Beider-Morse, Metaphone) within our search pipeline.
Implementation: How we integrated these phonetic tokens into our indexing strategy to filter noise without losing relevant results.

You will leave with a clear understanding of when to abandon standard fuzziness and how to leverage phonetic search to clean up your own noisy results.

Level: Beginner

See also: Presentation (3.6 MB)

Pietro Mele

Italian, adopted by France not long ago, I am a constant learner, dedicated to computer science and discovery, whether uncovering solutions or gaining insights.

Radu Pop

Radu provides Consulting Services as Solutions Architect at Adelean. He handles projects around Elasticsearch and Adelean’s A2 search technology. He oversees the integration and evolution of search engines within large e-commerce platforms, marketplaces or organizations' data lakes. Prior to joining Adelean, Radu acquired a solid experience in Web archiving, operating large scale crawling systems in the context of several European research projects. He holds a PhD in Computer Science and a MSc in Distributed Systems.

Reviving phonetic algorithms for better search relevance .ical 2026-06-09 09:30–09:50, Kesselhaus

Reviving phonetic algorithms for better search relevance
.ical
2026-06-09 09:30–09:50, Kesselhaus