Berlin Buzzwords 2024

From Natural Language to Structured Solr Queries using LLMs
2024-06-10 , Maschinenhaus

We explore the usage of AI, especially NLP techniques and LLM, to enhance Apache Solr data accessibility. We propose translating natural language queries into structured Solr queries using LLM and metadata to improve search and user experience. We’ll discuss the results and future directions.


This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index's metadata.
This approach leverages the LLM's ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.

See also: Slides (1.5 MB)

After an initial experience in the healthcare sector, believing strongly in the power of Big Data and Digital Transformation, Ilaria earned a Master in Data Science.
Since joining the Sease team (in 2020), she has gained a diverse range of experiences through projects related to Machine Learning and Natural Language Processing for Information Retrieval systems.
Ilaria has been working on integrating Learning To Rank and Search Quality Evaluation in e-commerce ecosystems, with the goal of improving their performance and the relevance of search results.
Additionally, she is an active member of the information retrieval research community, regularly sharing her knowledge through blogs and talks, contributing to open-source projects, and participating at international conferences, such as Berlin Buzzword and ElasticON.

Anna has demonstrated a passion for Information Retrieval since the University. Graduated from the University of Padua, with a computer science master’s degree dissertation in Entity Search, Anna has been working as a Search Consultant in Sease since 2019.
She actively works to support clients in the process of improving their search engines with the implementation of innovative personalized solutions.
She specializes in the integration of machine learning techniques with information retrieval systems, from Learning to Rank techniques to Neural Searches and Recommender Systems. She extensively worked on e-commerce websites, improving their performance by developing personalized models and evaluation systems.
Anna highly believes in innovation and research, keeping up-to-date with the latest academic studies and contributing to them. She participated in the European Conference of Information Retrieval 2022 with a poster on offline and online evaluation in the industry; and published a paper on improving interleaving techniques for the evaluation of information retrieval systems at the ECIR 2023.