Berlin Buzzwords 2024

Alessandro Benedetti is an Apache Lucene/Solr committer and PMC member, Director and R&D Software Engineer at Sease Ltd.
His focus is on R&D in Information Retrieval, Information Extraction, Natural Language Processing, and Machine Learning.
 He firmly believes in Open Source as a way to build a bridge between Academia and Industry and facilitate the progress of applied research.
 Experience with a great variety of clients has taught him to be a proficient and professional consultant.
When he isn't on clients' projects, he is actively contributing to the open-source community and presenting the applications of leading-edge techniques in real-world scenarios at meet-ups and conferences such as ECIR, the Lucene/Solr Revolution, Community Over Code (ex ApacheCon), Haystack, FOSDEM, Berlin Buzzword, and Open Source Summit.

Hybrid Search with Apache Solr Reciprocal Rank Fusion

Alexander Butenko

I'm a senior MLOps engineer at GetYourGuide.

From Text to Context: How We Introduced Hybrid Search

Alexander Weiss

Dr. Alexander Weiss is a Senior Data Science Manager at GetYourGuide, overseeing the Data Product teams within the Marketplace division. Holding a Ph.D. in Probability Theory from TU Berlin, he brings more than ten years of expertise in data science and analytics. While his teams primarily focus on developing data products in the realms of search and relevance, Alexander harbors a deep passion for experimentation statistics. He is passionate about empowering organizations to make data-driven decisions and is excited to share his insights on sequential testing at the upcoming conference. His hobbies are on hold until the children have moved out.

Sequential Testing Simplified with Basemath

Andreas Wagner

Andreas Wagner is a serial entrepreneur who is passionate about teaching software to comprehend human behavior. With more than 15 years of extensive experience in the retail and eCommerce industries, he has held various positions in different companies. Currently, he is serving as the CTO and strategist at searchHub.io, where he continues to pursue his mission of improving software's ability to understand human needs.

Improving Search @scale with efficient query experimentation

Anna Ruggero

Anna has demonstrated a passion for Information Retrieval since the University. Graduated from the University of Padua, with a computer science master’s degree dissertation in Entity Search, Anna has been working as a Search Consultant in Sease since 2019.
She actively works to support clients in the process of improving their search engines with the implementation of innovative personalized solutions.
She specializes in the integration of machine learning techniques with information retrieval systems, from Learning to Rank techniques to Neural Searches and Recommender Systems. She extensively worked on e-commerce websites, improving their performance by developing personalized models and evaluation systems.
Anna highly believes in innovation and research, keeping up-to-date with the latest academic studies and contributing to them. She participated in the European Conference of Information Retrieval 2022 with a poster on offline and online evaluation in the industry; and published a paper on improving interleaving techniques for the evaluation of information retrieval systems at the ECIR 2023.

From Natural Language to Structured Solr Queries using LLMs

Ansgar Gruene

Ansgar Gruene, Ph. D., is a Senior Data Scientist at GetYourGuide in Berlin. His work focuses on ML approaches to improve the users search and discovery experience on the platform. He holds a Ph.D. in Theoretical Computer Science and has several years of experience as Backend Engineer and Data Scientist in the travel industry.

From Text to Context: How We Introduced Hybrid Search

Ashish Khatkar

Ashish is a Software Engineer and Tech Lead in the Stream Processing team at Yelp. He is currently focusing on modernizing the streaming and data infrastructure at Yelp.

Evolving Yelp's Streaming at Scale

Atita Arora

Atita Arora is a seasoned and esteemed professional in information retrieval systems and has decoded complex business challenges, pioneering innovative information retrieval solutions in her 15-year journey as a Solution Architect / Relevance strategist / Individual Contributor.
She has a robust background from her impactful contributions as a committer in various information retrieval projects.
She has a keen interest in making tech innovations accessible and implementable to solve real-world problems.
She is currently immersed in researching about evaluating RAGs while navigating the world of vectors and LLMs, seeking to uncover insights that can enhance their practical applications and effectiveness.

Cracking the Code: Deciphering Evaluation Essentials for RAG

Bilge Yücel

Bilge is a Developer Relations Engineer at deepset, working with Haystack, an open source LLM framework. With over two years of experience as a Software Engineer, she developed a strong interest in NLP and pursued a master's degree in Artificial Intelligence at KU Leuven with a focus on NLP. Now, she enjoys working with Haystack, writing blog posts and tutorials, and helping the community build LLM applications. ✨ 🥑

Improve LLM-based Applications with Fallback Mechanisms

Bo Wang

Bo Wang is an Engineering Manager at Jina AI, where he heads the machine learning team, focusing on enhancing search capabilities. Previously, he contributed to jina-embeddings, cutting-edge text embedding models, and Finetuner, a cloud platform for fine-tuning embedding models. Bo earned his master's degree in Computer Science from Delft University of Technology, the Netherlands.

Jina Embeddings V2: From Raw Data to Bilingual Hybrid Search

Branimir Lambov

Core database engineer for Apache Cassandra

Applications of Tries in Apache Cassandra

Bryan Burkholder

Bryan Burkholder is a Staff Software Engineer at Slack, focused on improving observability adoption across the engineering organization. Recently this work has centered around developing an open-source log search and analytics engine that can handle petabyte scale in a cost-effective manner.

Standing on the Shoulders of Giants

Charles Njoroge

Charles is a skilled technologist and innovator with a solid academic background from Brown University. With experience in both software engineering and finance, Charles has a proven track record of driving impactful changes in various fields.

At Zillow, Charles played a key role in the complete rebuild and redesign of the Zillow Search platform over several years, significantly improving its functionality and user experience. This achievement highlighted Charles' expertise in developing scalable, high-performance systems.

After transitioning to fintech, Charles used their technical skills to create one of the industry's first regulatory dispute products, marking a notable advancement in the regulatory technology landscape.

Currently, Charles is at the forefront of innovation at Reddit, contributing to Reddit's core infrastructure, specializing in Kubernetes and Solr-based solutions. Their work is focused on enhancing the performance and reliability of Reddit Search.

With a dedication to excellence and a forward-thinking approach, Charles continues to explore new possibilities in technology. Outside of work, Charles is an avid reader and an open water long-distance swimmer.

Learning to Rank for Reddit Search - A Project Retro

Chingis Oinar

Chingis is a Machine Learning engineer at Mercari, Japan’s largest C2C marketplace. Chingis spearheads the development of features and the ongoing optimization of ML search ranking system, paying particular attention to model and feature de-biasing. Additionally, Chingis is designing and validating robust model evaluation metrics using implicit feedback. He is also building solutions that leverage machine learning embedding models for the enhancement of search feature development.

Robust AI Search Ranking for Radical C2C Marketplace Growth

Chinmay Soman

Chinmay Soman is the head of Product at StarTree, building the next generation real time analytics platform for companies of all sizes. Previously he led the streaming platform team at Uber for building a large-scale, self-serve platform around messaging, stream processing and OLAP technologies. Before that, he worked at LinkedIn and IBM, focussing on distributed systems and security. He’s a PMC member of Apache Samza and a committer on Apache Pinot, Voldemort, uReplicator and AthenaX.

Can Apache Pinot replace your OLTP database?

Chloe He

Chloe has a background in data science and started working on streaming systems when she was a Founding Engineer at Claypot AI, a startup tackling challenges in real-time machine learning. She led the infrastructure development of an open-source real-time feature engineering platform and worked on the translating and optimizing streaming workloads that served low-latency use cases. Later, she brought her streaming expertise to Voltron Data, where she now leads the development of streaming technologies.

Streaming doesn’t have to be hard

Chloé Caron

Chloé is a Tech Lead, Data Engineer and Full-stack Developer at Theodo UK. Having worked on several projects, Chloé has continuously expanded her experience by working with both startups and well-established companies. With a keen curiosity for exploration, Chloé frequently embarks on exploratory journeys into innovative data topics, journeys that she often shares through Twitter.

Can ChatGPT build a Data Platform faster than a developer?

Corey J. Nolet

Corey is a principal engineer on the RAPIDS ML team at NVIDIA, where he builds machine learning algorithms that support extreme data loads at light speed. Prior to joining NVIDIA 5 years ago, Corey spent over a decade building massive-scale exploratory data science & real-time analytics platforms for big-data and HPC environments in the defense industry. Corey holds Bs. & Ms. degrees in Computer Science. He is also finishing up his Ph.D. in the same discipline, focused on the acceleration of algorithms at the intersection of graph and machine learning. Corey has a passion for using data to make better sense of the world.

cuVS and Lucene: GPU-based Vector Search

Danica Fine

Danica Fine is a Staff Developer Advocate at Confluent where she helps others get the most out of Kafka and their event-driven pipelines. In her previous role as a software engineer on a streaming infrastructure team, she predominantly worked on Kafka Streams- and Kafka Connect-based projects to support computing financial market data at scale. She can be found on Twitter, tweeting about tech, plants, and baking @TheDanicaFine.

Brick-by-Brick: Exploring the Elements of Apache Kafka®

Daniele Antuzi

Daniele Antuzi is a software engineer passionate about high-performance data structures and algorithms. He has been working for 4 years in finance (List Spa) and 2 years in cloud services (Amazon Web Services) but his curiosity to learn more about information retrieval brings him to join Sease Ltd.
He likes studying and experimenting with new technologies trying to reduce the gap between academia and industry.

Blazing-Fast Serverless MapReduce Indexer for Apache Solr

Desmond Obisi

I am Desmond Obisi, a software engineer with over five years of experience building web products. I'm an open-source contributor and technical writer whose vision is to build great products and make them easy for the world to use. I am interested in UX, performance, and code interoperability. I am interested in things like blockchain, AI/ML, Cloud Native, and Research. You'll see me speaking and engaging communities if I'm not building products.

I currently contribute to Ansible as a documentation writer, CHAOSS Project as a maintainer, and to Ambassador Labs as a community advocate. Full-time, I work as a software engineer at Resilis

Kafka on the Fly: A Serverless Approach to Data Streaming

Dharin Shah

Hello,

I am a senior engineer working in Search team at Getyourguide. I am responsible for all the infrastructure and data processing for search, which is exposed via generic APIs. I also have deep interest in performance and databases in general, and i have past experience in contributing to Opensearch. I enjoy reading technical white papers, as well as reading more about the current AI hot-trends in general.

From Text to Context: How We Introduced Hybrid Search

Djordje Benn-Maksimovic

Djordje initially read for a Master's in Physics, made a detour working for an American software enterprise before studying Economics and discovering his love for IT consulting and data science.
Presently at Eviden, he researches robust and secure machine learning, aiming to optimize the public sector's integration of recent artificial intelligence advancements.

Enhancing RAG with Neo4j Knowledge Graph

Doug Turnbull

Doug Turnbull has been enthusiastic about search relevance since 2013. He co-authored Relevant Search and AI Powered Search. He created Quepid and Splainer for search relevance testing. He co-created the Elasticsearch Learning to Rank plugin with Wikimedia Foundation and Snagajob. Doug loves learning from other search practitioners, and hopes you'll bring inquisitive curiosity and experiences to this talk.

Doug currently works at Reddit where he's helping bring Machine Learning to search. Recently Doug worked at Shopify to help improve merchant search attributed revenue by 19% year over year. Doug spent 8 years consulting at dozens of organizations improve search relevance during his time as CTO at OpenSource Connections.

Doug blogs about search and other topics at http://softwaredoug.com

Learning to Rank for Reddit Search - A Project Retro

Emanuele Lapponi

I have been working with NLP for the last 15 years in academia, ML startups and finance. As further evidence that I am smart, I have a Ph.D. in Computational Linguistics. While I do enjoy solving challenging NLP tasks with both state-of-the-art and 'vintage' techniques, most of the time I'd rather be rollerblading.

Hardcoding airpods (and other stories from NLP in insurance)

Gergely Daroczi

Gergely Daroczi is an enthusiast R user and package developer for 20 years; Ph.D. in social sciences; former Assistant Professor in Sociology, currently Lecturer at the Business Analytics program of CEU; 15+ years of industry experience in data science, engineering, cloud infrastructure, and data operations at SaaS, fintech, adtech, and healthtech startups with a strong interest in building scalable data platforms. He maintains a dozen open-source packages related to using R in production (automated reports, logging, database connections, API integrations), contributed to Python packages, co-authored several journal articles in social and medical sciences, and wrote a book on "Mastering Data Analysis with R".

Harnessing Spare Cores to Breeze Through Cloud Compute

Hajer Bouafif

Hajer Bouafif is a solutions architect in Data Analytics and search with a background in Big Data engineering. Hajer provides organizations with best practices and well-architected reviews to build large-scale Machine Learning search solutions.

Rediscover your keyword search: Expand, Enrich and Rewrite

Hans-Peter Grahsl

Hans-Peter Grahsl is a Developer Advocate at Red Hat. He is an open-source community enthusiast and in particular passionate about event-driven architectures, distributed stream processing systems and data engineering. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB communities, Hans-Peter received multiple community recognition awards and became one of the founding members of the MongoDB Champions Program in 2020. He is a regular speaker at international tech-related and developer conferences for several years.

End-to-End Encryption for Streaming Data Pipelines

Hellmar Becker

Hellmar Becker is a Senior Sales Engineer at Imply. He has worked in data analytics for more than 20 years in various pre- and post-sales roles. Hellmar has worked with large customers in the finance, telco and retail industries, and spent several years at big data company, Hortonworks, and recently at Confluent.

Let's Do Data Lineage in Kafka, Flink and Druid!

Igor Canadi

Igor Canadi is a founding engineer and architect at Rockset working on the distributed SQL query engine and cloud-native architecture. Previously, Igor was an engineer at Facebook, working on the database engineering and product infrastructure teams, where he contributed to RocksDB, developed MongoRocks and MongoDB with RocksDB storage engine, drove RocksDB open source initiatives, worked on core GraphQL infrastructure for Facebook’s Android application, and owned GraphQL developer tooling for hundreds of developers. Igor holds a master’s degree in computer science from the University of Wisconsin-Madison and a bachelor’s degree from the University of Zagreb. In his free time, he likes sailing and snowboarding.

How we isolate streaming ingest from search using RocksDB

Ilan Ginzburg

Working on search architecture at Salesforce in Grenoble, France. Lucene/Solr committer.
Holding a business administration and computer science engineering degrees and a PhD in parallel computing.
Prior to Salesforce, worked at Intel, HP Labs in Palo Alto and EMC/Documentum among others. Long ago wrote the Apple II computer game “Saracen.”
When not in front of a screen I'm usually either drumming, biking in the Alps or paragliding above them.

Search in the Cloud: separation of compute and storage

Ilaria Petreti

After an initial experience in the healthcare sector, believing strongly in the power of Big Data and Digital Transformation, Ilaria earned a Master in Data Science.
Since joining the Sease team (in 2020), she has gained a diverse range of experiences through projects related to Machine Learning and Natural Language Processing for Information Retrieval systems.
Ilaria has been working on integrating Learning To Rank and Search Quality Evaluation in e-commerce ecosystems, with the goal of improving their performance and the relevance of search results.
Additionally, she is an active member of the information retrieval research community, regularly sharing her knowledge through blogs and talks, contributing to open-source projects, and participating at international conferences, such as Berlin Buzzword and ElasticON.

From Natural Language to Structured Solr Queries using LLMs

Isabelle Mohr

Over the past three years, Isabelle has made Berlin her home and hub of professional growth, nurturing a deep-seated passion for the intersection of language and technology. With a master's degree in Computational Linguistics, she has embarked on a journey into the complex world of language processing, leading her to Jina AI. Since joining the company two years ago as a Machine Learning Engineer, she has played a pivotal role in the development and training of text embedding models, working closely with her team to push the boundaries of what's possible. Beyond her technical contributions, she is passionately committed to sharing her knowledge and enthusiasm for the field; giving talks on machine learning and NLP has become a significant and fulfilling part of her career, enabling her to inspire and connect with others who share her interests.

Jina Embeddings V2: From Raw Data to Bilingual Hybrid Search

Jannik Heyl

Jannik has built the Business Unit Big Data & Analytics from Ground up to around 30 Employees. He has studied Computer Science and is despite his business role still an enthusiastic techie. Jannik is also lecturer at a local University since over 5 years for the bachelors degree computer science.

Lessons learned writing 10+ Kubernetes Operators

Jarek Potiuk

Independent Open-Source Contributor and Advisor, Committer and PMC member of Apache Airflow, Member of the Apache Software Foundation, Security Committee Member of the Apache Software Foundation

Jarek is an Engineer with a broad experience in many subjects - Open-Source, Cloud, Mobile, Robotics, AI, Backend, Developer Experience, Security, but he also had a lot of non-engineering experience - building a Software House from scratch, being CTO, organizing big, international community events, technical sales support, pr and marketing advisory but also looking at legal aspects of security, licensing, branding and building open-source communities are all under his belt.

With the experience in very small and very big companies and everything in-between, Jarek found his place in Open-Source world, where his internal individual-contributor drive can be used to the uttermost of the potential.

`New` Workflow Orchestrator in town: "Apache Airflow 2.x"

Joel Knighton

Joel Knighton is a Software Engineer at DataStax, working on vector search. He is a committer to JVector and Apache Cassandra. He has spent the last decade developing and operating large-scale databases.

Under the hood of vector search with JVector

Josh Reed

Josh Reed is a Senior Software Engineer in Release Engineering team at Aiven. He has spent the last 5 years of his career working to help teams achieve efficiency and quality in the development process. His motto is "Ship better software faster." He lives in Montreal, Canada with his wife, daughter, and cats. In his free time, he loves to play music, and to barbecue anything and everything that's good on a grill.

The “C” in CI/CD is not for “Closed”

Juliane Waack

Juliane is a Software Engineer in the Snowflake Berlin office, where she works on accelerating query performance by extending Snowflake’s pruning capabilities as part of the Search team. She holds a M.Sc. from Hasso Plattner Institute, Germany.

Accelerating TopK Queries

Kentaro Takiguchi

Kentaro is a Search ML engineer at Mercari, Japan’s largest C2C marketplace. While Kentaro is working on models, data pipelines, and evaluation for search, he is also exploring methods to effectively combine a complex search system, built on top of existing search technologies, with new technologies.

A Practical Approach To Semantic Search

Konrad Richter

Konrad Richter is a Senior Data Analyst at GetYourGuide, developing their internal experimentation platform and enabling product teams to make data driven decision. Previously he worked on pricing-related topics at Zalando and Mercedes-Benz, and holds a masters' degree in Computational Data Analytics from the Georgia Institute of Technology.

Sequential Testing Simplified with Basemath

Lars Albertsson

Lars Albertsson is the founder of Scling, a data engineering startup based in Stockholm. Scling provides data-factory-as-a-service - customer tailored data engineering, analytics, and data science. Lars is a frequent conference speaker on data engineering and data strategy. Before founding Scling, Lars has worked at Google, Spotify, Schibsted, and as an independent consultant, helping organisations create business value from data processing and machine learning.

End-to-end pipeline agility

Lars Francke

Lars has been a Big Data freelancer (when it was hip!) for 12 years before founding the consulting company OpenCore, followed by the product company Stackable where he is currently the CTO. He's spoken at conferences worldwide, is an open-source enthusiast, a committer in various projects, a member of the Apache Software Foundation, and now relegated to do mostly non-technical tasks by his team. He's a father of two and lives in Germany.

Lessons learned writing 10+ Kubernetes Operators

Lucian Precup

Lucian Precup is the CTO of all.site - the collaborative search engine developed at Station F in Paris. With his colleagues at Adelean, Lucian develops solutions for indexing, searching and analyzing data. Lucian regularly shares his knowledge in specialized conferences and organizes the Search & Data Meetup.

Apache Lucene: From Text Indexing to Artificial Intelligence

Luuk Kaandorp

My name is Luuk Kaandorp, 25 years old, and I've been working at Albert Heijn as a Data Scientist in the Search time for about 1.5 years. I previously did a Bachelor in Information Sciences at the Vrije Universiteit Amsterdam (VU), and a Master in Artificial Intelligence at the Universiteit van Amsterdam (UvA). During my master's, I specialised mainly in Natural Language Processing and Information Retrieval. After my master's thesis on Diversity in Personlised News Recommendations at RTL, I joined Albert Heijn as my first full-time job. Since then, I've been trying to apply all the knowledge I gained during my studies on the product search domain, to deliver our customers with the most relevant search experience on our website and app.

The Power of the Bonus Card: Road to Personalised Search

Manish Saraswat

Manish is currently working as a Senior Data Scientist with a strong focus on building, deploying and serving models. With over nine years working on machine learning problems, he really enjoys building data products around improving search, ranking and recommendations. Outside work, he likes to do outdoor activities like running, swimming etc.

Better search relevance using Learning to Rank at mobile.de

Maryblessing Okolie

Maryblessing, is a dedicated community architect and passionate advocate, with strong analytical and engagement skills in tech communities. She’s passionate about sustainable tech ecosystems with a commitment to championing diversity, inclusion, and belonging within workplaces and the broader community.

Recognizing the driving forces of impactful OS projects

Michael Dinzinger

Hello, I'm a PhD student at the University of Passau. As my work is related to web crawling, I'm interested in all new things related to Data Science and Information Retrieval.

Open Web Search - a platform for a free European Web Index

Mingshi Liu

Mingshi Liu is a Machine Learning Engineer / Software Development Engineer at AWS working on OpenSearch, mainly focusing on OpenSearch core and ML Commons plugin which provides machine learning (ML) features in OpenSearch.

Elevating AI Applications with OpenSearch's Flow Framework and RAG Tool

Murhaf Fares

Murhaf (he/him) has been working in NLP since 2011, both in academia and in the industry. He holds a PhD in NLP from the University of Oslo and currently works at Fremtind Insurance in Oslo, Norway. He has previously worked as an ML Engineer, data scientist and enterprise search consultant and participated in several research projects and initiatives during his years in academia.

Hardcoding airpods (and other stories from NLP in insurance)

Nick Burch

Nick began contributing to Apache projects in 2003, and hasn't looked back since! He's mostly involved in "Content" projects like Apache Tika and Apache POI and Apache Chemistry, as well as foundation-wide activities like Travel Assistance and Community Development.

Nick works running development teams for startups, but the open source stuff is usually more interesting!

Monitoring your home, with DevOps observability tools
Barcamp

Nicolas Fränkel

Developer Advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). Usually working on Java/Java EE and Spring technologies, but with focused interests like Rich Internet Applications, Testing, CI/CD and DevOps. Also double as a trainer and triples as a book author.

Practical introduction to OpenTelemetry tracing

Nils Larsgård

Nils is a long time programmer with experience in database handling and streaming technologies. I have been working as a consultant for 14 years, helping clients to handle data in efficient ways.

A journey in geospatial timeseries

Ohad Levi

A product and business expert, Ohad is a visionary product leader with over 15 years of experience in driving product strategies for disruptive technologies, building strong teams and engaging people around ideas. After leading product teams at Intel, HP and Click Software and launching enterprise-grade products in new markets, Ohad set out on a new journey. As the CEO and Founder of Hyperspace, Ohad is set on a mission to introduce an enterprise-grade, AI-search acceleration engine for companies making real-time predictions and facing performance and scalability challenges.

Shattering the Limits of Search with Domain Specific Computing

Olivier HUBER

For more than 25 years, I have been driven to create, to invent, and to acquire new skills.
I think one of the quickest and simplest routes between having an idea in our heads and actually making it happen is in the field of IT.
I love to share my findings, creations and experiences at diverse venues and conferences.
I’m constantly searching for new techniques to engage and interest my audience.
I believe that learning is supposed to be fun.
I’m a super passionate and high energy guy who loves to teach, speak, and train others in IT, soft skills, love languages and lots more!

Build your 8-bit computer from scratch

Owais Kazi

is a Software Engineer at AWS, focusing on OpenSearch, OpenSearch plugins, and Generative AI applications. He is a maintainer of OpenSearch core, Anomaly Detection, and Flow Framework, as well as an active contributor to various Open Source projects.

Elevating AI Applications with OpenSearch's Flow Framework and RAG Tool

Pere Urbon Bayes

Pere is a Software Architect working for Confluent out of Berlin, Germany. He has been working with data and architecting systems for more than 15 years as a freelance engineer and consultant. In that role he was focused on data processing and search, helping companies build reliable and scalable data architectures. His work usually sits at the crossroad of infrastructure, data engineers and scientists. When not working, Pere loves to spend time with his lovely wife and kids, build Legos and enjoy Handball. These days finishing a M.Sc on Computational Engineering and Maths, before jumping into a PhD.

Deep Learning plays Handball

Petr Polezhaev

Petr Polezhaev is a data scientist at SIXT, focusing on NLP and GenAI. Previously, he worked as a data scientist in technology companies, creating data products for industry and academia, including recommender systems and AI components for educational platforms.

Advancements in Evaluating Large Language Model Applications

Praveen Mohan Prasad

A search enthusiast actively researching and experimenting on using Machine Learning to improve relevance.

Rediscover your keyword search: Expand, Enrich and Rewrite

Radovan Bacovic

Radovan Bacovic is a Staff Data Engineer at GitLab coming from Novi Sad, Serbia.
Radovan is an experienced Data Engineer and “wanna-be” the best bad Conference speaker. Forever eager to discover new data technologies in an agile environment. He armoured himself with a profound application development background in large international companies around the globe, with a strong focus on the open-source community.
He is a passionate data geek delighted to share his long mileage and experience with a broader audience.
He has been trapped in the Data world for almost 20 years.
Passionate Brazilian Jiu-Jitsu practitioner.

Remote work is here to stay - and what's next?

Radu Gheorghe

Radu Gheorghe works mainly as a search consultant at Sematext, working with clients of all sizes on their Elasticsearch, OpenSearch and Solr projects. He is also a trainer and does production support for both these search engines.

Sometimes he helps out with the development of Sematext Cloud (an observability SaaS), mostly when it comes to Elasticsearch and log shippers (e.g. Logstash, rsyslog…). He also writes on the Sematext blog or helps other publish new articles.

He co-authored a book (Elasticsearch in Action, Manning), recorded a video tutorial (Working with Elasticsearch, O'Reilly) and was a speaker at a number of conferences, such as Berlin Buzzwords, LuceneSolrRevolution (later Activate) and Kubecon.

Heap sizing and GC tuning for Solr and friends

Rafał Kuć

Software engineer, trainer, consultant and author from time to time - some would say that he is an all in one battle weapon concentrated mostly on Lucene, Solr and Elasticsearch. Currently an Engineering Lead in Archipelo. However he also likes all the other cool stuff that is happening in the IT world. Likes to share his knowledge by giving talks at various meet ups and conferences.

Heap sizing and GC tuning for Solr and friends

Saahil Ognawala

Saahil is the Head of Product at Jina AI, combining technical expertise in generative AI, machine learning, and search+reranking, with a strong foundation in SaaS product management. With over 7 years of experience in the field, he aims to leverage AI to transform how we interact with content, move in physical spaces, and create positive change.

Saahil has led the development and launch of multiple AI products, including multimodal models, knowledge graphs, and the application of ML to enhance security. His expertise extends through product lifecycle management, from conceptualization and deployment to go-to-market and commercialization.

Saahil studied Computer Science (M.Sc. and Ph.D.) at TU Munich.

Open-Source Generative AI: A Product Manager’s Blueprint

Sebastian Arnold

Sebastian Arnold graduated in Computer Science at TU Berlin and received his Ph.D. in 2020 for his thesis on Machine Reading for Domain-specific Text Resources from University of Fribourg, Switzerland in cooporation with the DATEXIS research group at Beuth University of Applied Sciences Berlin. From 2020 he led the data product development of Curalie GmbH (Fresenius Group) in the field of digital healthcare as Head of Data Science. In 2023 he joined Bayer AG Pharmaceuticals as Principal Data Scientist and co-leads the organization's cross-divisional platform for Generative AI.

Learning to Apply Generative AI at Enterprise Level

Sonam Pankaj

Sonam is the creator of the open-source library called Embed-Anything, which helps to create local and multimodal embeddings. She worked previously at Qdrant engine, in RAG, and before that, she worked at Rasa in conversational AI and generative. Previously, she worked as an AI researcher at Saama and has worked extensively on clinical trial analytics with Pfizer. She is passionate about topics like Biases in language models. She has also published a paper in the most reputed journal of computational linguistics, COLING, in ACL Anthology.

The Unsung Hero of Vector Database -- Metric Learning

Stefan Sprenger

Stefan works as a staff software engineer at Confluent where he builds developer tooling for Kafka and other data streaming technologies. Previously, he co-founded a startup in the data streaming space, worked as a data engineer in the financial industry, and researched database systems on modern hardware. He loves Neapolitan pizza.

Taming the cost of Kafka workloads in the cloud

Stefana Serban

Data Science Lead for Search@eMAG, Stefana excels in leveraging machine learning techniques to refine relevance and user experience. Her data science expertise, coupled with a keen eye for detail, fosters rapid innovation cycles and adaptable and resilient strategies.

Synergy of Signals: Traffic Logs Meet LLM Labels

Stephan Ewen

Stephan is one of the original creators and of Apache Flink and founder / CTO of dataArtisans/Ververica. Recently, he co-founded Restate (https://restate.dev/) with the goal to simplify distributed application development, event-driven app, and microservice architectures.

Fixin the Hard Bits of Event Processing with Restate & Kafka

Teo Narboneta Zosa

Teo is a machine learning engineer at Mercari, Japan’s largest C2C marketplace. As a founding member and technical lead of the AI search ranking team, Teo is working across various business-critical projects in the Search & Discovery group to solidify Mercari as a leader in Japanese e-commerce search.

Robust AI Search Ranking for Radical C2C Marketplace Growth

Tim Zöller

Tim founded the company lambdaschmiede GmbH. He helps his clients to digitalize their manual business processes with Java and is a co-founder of the Java Usergroup Mainz. In his free time, he accumulates new side projects with Java and Clojure and sometimes even finishes one of them.

Back to the Future! Time Travel with Bitemporal Databases

Timo Walther

Timo Walther is a Principal Software Engineer at Confluent and a long-time member of Apache Flink’s management committee. He studied Computer Science at TU Berlin and was part of the Database Group there - the origins of Apache Flink. He worked as a software engineer at DataArtisans and led the SQL team at Ververica. He was a Co-Founder of Immerok which was acquired by Confluent in 2023. In Flink, he is working on various topics in the Table & SQL ecosystem to make stream processing accessible for everyone.

Flink's SQL Engine: Let's open the engine room!

Tomáš Neubauer

Tomáš Neubauer is a co-founder and CTO at Quix, where he works as the technical authority for the engineering team and is responsible for the direction of the company across the full technical stack. He was previously technical lead at McLaren, where he led the architectural uplift of the real-time telemetry acquisition platform for the Formula 1 racing team.

In his spare time, Tomáš likes to go mountain biking in the hills around Prague.

Streaming DataFrames: A New Way to Process Streaming Data

Tudor Golubenco

Tudor is CTO at Xata, a Postgres platform that brings in extra features like branching, automatic replication to search, and schema migrations improvements. Before Xata, Tudor has worked at Elastic for several years.

Comparing vector implementations in generic databases

Tun Shwe

Tun Shwe is the VP of Data at Quix, where he leads data strategy and developer relations. He is focused on helping companies imagine and implement their strategic data vision with stream processing at the forefront. He was previously a Head of Data and Data Engineer at high growth startups and has spent his career leading T-shaped teams in developing analytics platforms and data-intensive AI applications.

In his spare time, Tun goes surfing, plays guitar and tends to his analogue cameras.

Moving from Offline to Online Machine Learning with River

Varun Thacker

Varun Thacker is a Staff Software Engineer at Slack, currently focused on log search for observability data. He is an Apache Lucene and Solr committer and Project Management Committee member. Previously, he has worked on search at Slack and Lucidworks.

Standing on the Shoulders of Giants

Vincent Peijnenburg

Vincent Peijnenburg is a Data Scientist in the Search team at Albert Heijn, the largest supermarket chain in the Netherlands, both offline and online. He has been in this position for about 1 year and has previously worked for Transavia (the Dutch airline part of KLM-group) for 5 years, working on recommender systems, pricing systems, and other ML applications in the commerce domain. He has experience with both the modeling and the Mlops side, delivering end-to-end solutions, and is always eager to try out the latest tools and techniques and apply them in a business context.

The Power of the Bonus Card: Road to Personalised Search

Vivek Narang

Vivek Narang is a software engineer at SearchScale and is currently working on building search-related tools and products.

cuVS and Lucene: GPU-based Vector Search

William Benton

William Benton is passionate about making it easier for machine learning practitioners to benefit from advanced infrastructure and making it possible for organizations to manage machine learning systems. His recent roles have included defining product strategy and professional services offerings related to data science and machine learning, leading teams of data scientists and engineers, and contributing to many open source communities related to data, ML, and distributed systems. Will was an early advocate of building machine learning systems on Kubernetes and developed and popularized the “intelligent applications” idiom for machine learning systems in the cloud. He has also conducted research and development related to static program analysis, language runtimes, cluster configuration management, and music technology.

Large language models are not a paradigm shift

Yingjun Wu

Yingjun Wu is the founder of RisingWave Labs (https://www.risingwave.com/), a database company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. Yingjun received his PhD degree from National University of Singapore, and was a visiting PhD at Carnegie Mellon University. He has been working in the field of stream processing and database systems for over a decade.

S3 as the state store for stream processing systems

Zain Hasan

Zain Hasan is a senior ML developer advocate at Weaviate. An engineer and data scientist by training, he pursued his undergraduate and graduate work at the University of Toronto building artificially intelligent assistive technologies, then founded his company, VinciLabs in the digital health-tech space. More recently he practiced as a consultant senior data scientist in Toronto. Zain is passionate about the fields of machine learning, education, and public speaking.

Advanced Retrieval-Augmented Generation Techniques

Zuzanna Warso

Zuzanna is the Director of Research at Open Future. She has over ten years of experience with human rights research and advocacy. In her work, she has focused on the intersection of science, technology, human rights and ethics.

Zuzanna spent more than eight years with the Helsinki Foundation for Human Rights, the most prominent human rights non-governmental organization in Poland, where she has gained experience in advocacy and policy work.

She holds a Ph.D. in International Law and an M.A. in English Studies from the University of Warsaw. She has been involved in national and international interdisciplinary research and innovation projects, exploring the ethics of new technologies and their impact on human rights and freedoms. Before joining Open Future, Zuzanna cooperated with Trilateral Research, where she researched the ethical and human rights challenges posed by new and emerging technologies.

She was awarded a scholarship from the German Federal Agency for Civic Education and Robert Bosch Foundation in 2013 and joined FAM Frauenakademie München, a research institute for women’s and gender issues. In 2017 she was awarded the Marshall Memorial Fellowship, the flagship leadership development program of the German Mashall Fund of the United States.

Zuzanna passed the bar exam in April 2017 and served as the vice-president of the Human Rights Section of the Warsaw Bar Association from March 2016 until May 2017. She is a member of the Women’s Rights Group by the Polish Bar Council.

Zuzanna is a member of the advisory board of the Institute for the Ethics of AI at the Technical University of Munich.

Since 2019 she has been acting as an independent expert to the European Commission, where she is involved in the ethics monitoring of research and innovation projects.

Zuzanna is passionate about the protection of the environment and women’s rights. She is a lecturer at the School of Ecopoetics established at the Reportage Institute in Warsaw.

Zuzanna lives in Warsaw. In her free time, she devours podcasts and takes long walks in the woods with her dog Bruno.

The Paradox of Open: Can Digital Commons Offer a Way Forward?