Berlin Buzzwords 2024

The Unsung Hero of Vector Database -- Metric Learning
2024-06-11 , Palais Atelier

We all know there are several vector databases, and once Andrej Karpathy said that even an array can do the same job, true but not true. Vector database provide you the infra to store the embeddings but how those embeddings are made are the most innovative part of all.


The unsung Hero of vector database. And I am talking about a machine learning concept not one of the companies (because there are so many)
1. Metric learning: What is metric learning?
2. Problem without metric learning, mostly with an example of negation. Cosine similarity doesn't work with negation.
3. How to train metric learning embeddings.
4. Data, Model, and the loss function
5. Data, what is anchor, positive and negative
6. Model: Siamese networks, since we deal with different data, which needs different architecture.
7. Loss: the loss function is big, triple loss, and contrastive loss.
5. Demo of how it improved the overall experience of working with negations.

See also: Slides (2.9 MB)

Sonam is the creator of the open-source library called Embed-Anything, which helps to create local and multimodal embeddings. She worked previously at Qdrant engine, in RAG, and before that, she worked at Rasa in conversational AI and generative. Previously, she worked as an AI researcher at Saama and has worked extensively on clinical trial analytics with Pfizer. She is passionate about topics like Biases in language models. She has also published a paper in the most reputed journal of computational linguistics, COLING, in ACL Anthology.