2023-06-19 –, Maschinenhaus
Deep learning for search has become a hot topic, while pre-trained neural nets do not function well as expected. We will discuss the algorithms behind model fine-tuning, and how to scale it up.
Deep learning for search has become a hot topic in recent years, it enables users to search based on semantics, search based on visual similarity, and conduct cross-multi/modality searches.
Though promising, it is non-trivial to use deep neural nets inside your system and expect it works out of the box. In fact, in most cases, it doesn't work. The reason can be summarised into three pillars: task shift, domain shift, and knowledge shift.
Firstly, most of the deep learning models are trained to minimize classification/regression/segmentation loss, rather than search loss. Secondly, the dataset on which the model was trained could be quite different from the data you're working on. Last but not least, we observed a notable knowledge gap between search engineers and machine learning engineers.
In this talk, we would like to gently guide the audience into the neural search world. Discuss the motivation behind model tuning. Then, we'll discuss the algorithm frameworks behind model fine-tuning, such as deep metric learning, contrastive learning and self-supervised learning. Last but not least, we'll talk about the infrastructure behind a mature training service and how could we scale it up.
We believe the topic could be interesting for the Berlin Buzzwords audience since it covers several aspects of the tags: search, data science, and scale. After the 40 minutes talk, the audience is expected to understand:
1. What is neural search and why it is important.
2. The algorithms to improve pre-trained neural nets for single-modality search/cross-modality search.
3. Our tech stack to scale the training platform up.
I enjoy bringing machine learning into production at Jina.ai as Head of Engineering. The combination of high quality engineering, digging into data and the real-world problem at hand thrills me.
Bo Wang is a senior Machine Learning engineer who's leading the development of Finetuner. He got his BSc from Lanzhou University, China, and MSc from TU Delft, the Netherlands with a background in multimedia information retrieval. He is the core developer of first wave semantic search framework MatchZoo, and also the developer of Jina Core & Docarray.