Laptop-sized ML for Text, with Open Source
2023-06-19 , Palais Atelier

Advanced ML models for text may need hundreds of machines, but with open source tools and pre-trained models, you can do a lot just on your laptop or docker container. Discover what and how!


AI text models like GPT3, ChatGPT, Bing AI and Github Co-Pilot are getting a lot of buzz right now, both good and bad. Much of the training techniques are public, but the computational and data requirements mean most of us can't build our own. Using these big models typically involves cost or sharing your data. What if that's not an option?

Luckily, there are a number of open source language models out there, with pre-trained versions available to download! They won't let you compete with Google or OpenAI, but they're good enough for a number of real world problems.

We'll start with a quick introduction to the main open ML-for-text systems like Word2vec, GloVe, ELMo and BERT, along with how they differ from traditional text relevancy like TF-IDF. Then, we'll discover how open source ML frameworks let us easily work with those techniques, and how pre-trained models let
us quickly get up and running.

With our ML-for-text model running on our laptop (or hefty docker container!), next it's time to see what kinds of problems we can solve! We'll look at embeddings for search, inference, semantic reasoning, prediction and more, all with (fairly) minimal coding. Finally, we'll see how we can improve the pre-trained models for specific use-cases with our own text.

It may not run on your phone and it probably won't hallucinate incorrect answers, but there's still a lot of text problems we can solve just with open source on our laptops. And we'll share the code you need to do so!

See also: Slides (2.9 MB)

Nick is heavily involved in a number of Apache projects, such as Tika and POI, while having the fortune to know many of the people involved in the Apache Big Data and Search space! When not helping out with Apache things, Nick works as the Director of Engineering at FLEC, where he leads a team making heavy use of Open Source technologies. When not helping improve the logistics industry, he is often to be found attending or organising BarCamps, Geek Nights, or other such fun events dedicated to sharing what's great and new!

This speaker also appears in: