Introducing Multi-valued Vector Fields in Apache Lucene
2023-06-19 , Kesselhaus

Multiple vectors in a field dedicated to K-nearest-neighbors search has been a fundamental problem for Apache Lucene for long.
This talk describes how this has been finally designed and implemented.


Since the introduction of native vector-based search in Apache Lucene happened, many features have been developed, but the support for multiple vectors in a dedicated KNN vector field remained to explore.
Having the possibility of indexing (and searching) multiple values per field unlocks the possibility of working with long textual documents, splitting them in paragraphs and encoding each paragraph as a separate vector: scenario that is often encountered by many businesses.
This talk explores the challenges, the technical design and the implementation activities happened during the work for this contribution to the Apache Lucene project.
The audience is expected to get an understanding of how multi-valued fields can work in a vector-based search use-case and how this feature has been implemented.

See also: Slides (2.5 MB)

Following his passion he entered the Apache Lucene and Solr world in 2010 becoming an active member of the community and Apache Lucene/Solr Committer and PMC member.
Experience with a great variety of clients has taught him to be a proficient and professional consultant.
Recently Alessandro has contributed Neural Search to Apache Solr and worked on integrating Apache Solr’s Learning To Rank in various company ecosystems with the aim of improving search result relevancy.
Prior to that he designed and developed an enterprise semantic search engine known as Sensefy using approaches such as Named Entity Recognition at indexing time, advanced autocompletion, and document similarity metrics.