Berlin Buzzwords 2024

Under the hood of vector search with JVector
2024-06-11 , Maschinenhaus

This is a speedrun through the past ten years of R&D on approximate nearest-neighbor search algorithms and vector databases, covering the major advances, the current state of the art, and possible future directions. This will be informed by Joel's experience contributing to JVector, a Java embedded vector search library.


  • ANN (approximate nearest neighbor) vector search is totally different from the problems that we're used to solving with databases. You can't just throw a B-tree at it and call it a day.
  • It's a super young field with arguably the most important breakthrough (HNSW) created only eight years ago. The best commercial products have started to ship designs based on DiskANN, which is five years old.
  • Nobody really knows the best way to solve several important problems (how to build indexes larger than memory? how to partition across machines? How to combine indexes to "garbage collect" obsolete data without a full rebuild?)
  • The importance of ANN to RAG (retrieval augmented generation) in generative AI is supercharging interest in the field and we should expect the state of the art to advance quickly.
See also: Slides (1.6 MB)

Joel Knighton is a Software Engineer at DataStax, working on vector search. He is a committer to JVector and Apache Cassandra. He has spent the last decade developing and operating large-scale databases.