Berlin Buzzwords 2025

Performance Tuning Apache Solr for Dense Vectors
2025-06-16 , Maschinenhaus

While powerful, dense vector search is not a plug-and-play feature that will scale straight out-of-the-box, particularly when it comes to extracting the maximum performance from limited compute resources. Come learn how we tuned dense vector indexes for our 100M+ document dataset, and drastically sped up our queries.


With the recent boom in AI, many organizations are in the process of building semantic search stacks from scratch powered by Apache Solr and dense vectors. What many quickly learn when dealing with dense vectors is just how heavy the compute requirements are for vector search compared with lexical search. If not well-tuned, vector search query latency can quickly skyrocket, even with an otherwise reasonably sized dataset.

We experienced this pain firsthand when we started vectorizing a 100M+ document dataset. While one can certainly approach this problem head-on by throwing hardware resources at it, this is neither a cheap nor fully-effective solution.

This talk will cover a brief introduction to how Apache Solr/Lucene builds dense vector indexes, the journey of how we optimized our dense vector setup, as well as highlight the pitfalls/best practices we learned.

Whether you’re a company building out full RAG pipelines or an enthusiast playing around with a novel alternative to standard lexical search, you’re going to want to squeeze the most performance out of your limited compute resources. Let us help you hit the ground running.


Tags:

Search, Scale, Stories

Level:

Intermediate

Kevin Liang is a software engineer on Bloomberg's Search Infrastructure engineering team in New York. As part of a team that offers search-as-a-managed-service, he works closely with Apache Solr day in and day out. This includes everything from the application-level software down to the bare-metal hardware. Recently, his work has focused increasingly on support for dense vector search, but has also covered a variety of subjects, including automation, backups, and major version upgrades.