Connect GPT with your data: Retrieval-augmented Generation
2023-06-20 , Maschinenhaus

Learn how to build with LLMs, like ChatGPT, and avoid typical pitfalls like hallucination and outdated information. Accompanied by practical code examples using the open source framework Haystack.


Large Language Models (LLMs), like ChatGPT, became the poster child of AI overnight. They changed how people search the web, how they write content, and how they code. These models have billions of parameters they can use to effectively store some of the information they saw during pre-training. This enables them to show deep knowledge of a subject, even if they weren't explicitly trained on it.

Yet, it’s not straightforward to use LLMs in enterprise use cases and embed them successfully in your product.

The most common challenges with LLMs are
1) They don't know anything about YOUR data
2) Their knowledge is not up-to-date
3) They hallucinate - it's hard to understand on what sources they based their answers on
4) It’s hard to assess their performance

In this talk, you will learn how to deal with all of the above challenges. We will demonstrate how to connect LLMs to your data and how to keep them up-to-date using retrieval-augmented generation. We will show how to design prompts that minimize hallucination and how to evaluate the performance of your NLP application by collecting end-user feedback. We share best practices of development workflows and typical traps along the way.

Each step will be accompanied by practical code examples using the open source framework Haystack. By the end of the talk, you will not only know the methods to overcome the above challenges but also have code examples at hand that let you kickstart the development of your own NLP features.

See also: Slides (1.4 MB)

Malte is Co-Founder & CTO at deepset, where he builds Haystack - an open source framework that lets you quickly build production-ready NLP services for semantic search, question answering & more. He holds a M.Sc. with honors from TU Munich and conducted research at Carnegie Mellon University. Before founding deepset he worked as a data scientist for multiple startups. He is an open-source lover, likes reading papers before breakfast, and is obsessed with automating the boring parts of our work.