Berlin Buzzwords 2024

Jina Embeddings V2: From Raw Data to Bilingual Hybrid Search
2024-06-11 , Palais Atelier

Embeddings transform text into numerical vectors, capturing semantic relationships. This talk explores the data preparation, training processes, evaluation, and a demo of Jina Embeddings V2 in a hybrid search pipeline, showcasing its practical applications.


In this talk, we explore the sophisticated design, training, and application of bilingual Jina Embeddings V2, the state-of-the-art German-English embedding model crafted here in Berlin. Acknowledging the inherent shortcomings of traditional exact match and term-based retrieval methods, we dive into the application of this bilingual model in a hybrid search setup. By combining vector-based search with conventional BM25 search, we harness the strengths of both approaches, leading to a marked enhancement in search results. This discussion is therefore highly relevant to anyone in the search field. Participants gain insights into the training processes of embedding models, the methodologies for sourcing and preparing data for these models, and the straightforward integration of our open-source German-English bilingual model into a search pipeline to enhance results. This talk is aimed at those keen on the latest in search and retrieval technologies, offering practical knowledge on improving search systems through the use of embeddings.

Bo Wang is an Engineering Manager at Jina AI, where he heads the machine learning team, focusing on enhancing search capabilities. Previously, he contributed to jina-embeddings, cutting-edge text embedding models, and Finetuner, a cloud platform for fine-tuning embedding models. Bo earned his master's degree in Computer Science from Delft University of Technology, the Netherlands.

Over the past three years, Isabelle has made Berlin her home and hub of professional growth, nurturing a deep-seated passion for the intersection of language and technology. With a master's degree in Computational Linguistics, she has embarked on a journey into the complex world of language processing, leading her to Jina AI. Since joining the company two years ago as a Machine Learning Engineer, she has played a pivotal role in the development and training of text embedding models, working closely with her team to push the boundaries of what's possible. Beyond her technical contributions, she is passionately committed to sharing her knowledge and enthusiasm for the field; giving talks on machine learning and NLP has become a significant and fulfilling part of her career, enabling her to inspire and connect with others who share her interests.