Semantic vs keyword search as context for GPT
06-20, 11:30–11:50 (Europe/Berlin), Palais Atelier

If you want to build a chat bot like ChatGPT on your own data, you need to use search to provide the context. Usually semantic search is used, but we've found that keyword search has some pros.


The OpenAI ChatGPT has taken the world by storm and people want to be able to offer the same type of chat bot experience on their own data. Such a bot can answer questions based on your documentation or knowledge base.

This can be done with the OpenAI API by providing the right context, extracted from your data, to the model. You can do this in two steps:

  • the search step: perform a search to select the documentation pages that are likely to contain the answer.
  • the GPT step: provide these pages as context with a prompt like "With this context: .... answer this questions: ...".

For the search step, semantic search is often used, because it makes use of the LLM capabilities. However, we have found that in practice keyword search (e.g. BM25 based) has some advantages when it comes to tuning the search step, and it tends to be more "explainable".

See also: Slides (964.1 KB)

Tudor is CTO at Xata, a modern serverless database that provides extra data functionality like AI, search, or image transformations. Previously, he had worked at data companies like Elastic and Oracle.