PyData Amsterdam 2023

Building a personal search engine with llama-index
09-15, 11:30–12:30 (Europe/Amsterdam), Hello, World! (Tutorials)

Wouldn’t it be great to have a Google-like search engine, but then for your own text files and completely private? In this tutorial we’ll build a small personal search engine using open source library llama-index.


In this tutorial we will build a small personal search engine using open source library llama-index. Llama-index provides utility functions for ingesting various kinds of data, breaking the data up in chunks, building an index of that data using vector embeddings, and retrieving data from the index based on queries. We can even use llama-index to post-process the retrieval results for us using large language models such as GPT.

The target audience is people that are already familiar with Python. Participants will experience working with unstructured data, vector embeddings, and explore the possibilities of the recent developments in natural language processing.

Workshop materials will be provided via Github before the start of the workshop. For the demo application, we will only use open-source software and models that are light enough to run on an average laptop without a GPU.
Using llama-index with OpenAI’s API is optional. If you want to explore postprocessing your results with GPT-3.5, we recommend registering an OpenAI account and making sure you have your OpenAI API key ready.

The github repository with all tutorial materials can be found here: https://github.com/datakami/pydata-llama-index-tutorial


Prior Knowledge Expected

Previous knowledge expected

Judith van Stegeren is a researcher, engineer and consultant. Together with Yorick van Pelt, she runs Datakami, a consulting firm specialized in large language models.

Previously, she worked as a machine learning engineer at Floryn, a Dutch fintech startup that finances small and medium businesses. Before that, she researched text generation at the University of Twente and worked as an information security specialist at the Dutch National Cyber Security Centre in the Hague.

Judith holds a PhD in computer science from the University of Twente. Her interests include investing, natural language processing, procedural art and video games.

Yorick is a consultant at Datakami doing generative AI for finance&creativity. He also has years of experience consulting in data science, ops and functional programming.