PyData Amsterdam 2023

MLOps on the fly: Optimizing a feature store with DuckDB and ArrowFlight
09-15, 15:20–15:50 (Europe/Amsterdam), Foo (main)

Feature Stores are a vital part of the MLOps stack for managing machine learning features and ensuring data consistency. This talk introduces Feature Stores and the underlying data management architecture. We’ll then discuss the challenges and learnings of integrating DuckDB and Arrow Flight into the our Feature Store platform, and share benchmarks showing up to 30x speedups compared to Spark/Hive. Discover how DuckDB and ArrowFlight can also speedup your data management and machine learning pipelines.


In this talk, we will cover the following topics:

• Introduction to Machine Learning Feature Stores (5 min): Understanding the role of feature stores in the MLOps stack and their significance in managing machine learning features within organizations.
• Data management architecture behind Feature Stores (2-3 min): Exploring the underlying mechanisms and data management components employed in feature stores.
• Introduction to DuckDB and Arrow Flight (5 min): Highlighting the integration of DuckDB and Arrow Flight into the PyData ecosystem, leveraging the capabilities of Arrow.
• The journey of integrating DuckDB and Arrow Flight into our Feature Store platform (12 min): Sharing our experiences and insights on integrating DuckDB and Arrow Flight into the Hudi-based Lakehouse platform that powers our (offline) feature store, discussing challenges and successes encountered along the way.
• Benchmarks (5 min): Presenting a benchmark comparing the performance of DuckDB/Arrow Flight vs Spark/HiveServer2, in particular for small to medium sized data.

Attendees will gain a deeper understanding of feature stores, insights into the integration of DuckDB and ArrowFlight into the PyData ecosystem, and practical knowledge on enhancing the performance of machine learning pipelines.


Prior Knowledge Expected

Previous knowledge expected

Fabio Buso is VP of Engineering at Hopsworks, leading the Feature Store development team. Fabio holds a master’s degree in Cloud Computing and Services with a focus on data intensive applications.

Till Döhmen is a Research Engineer at Hopsworks, where he is contribibuting to the development of Hopswork's Python-centric Feature Store platform. In addition to his work at Hopsworks, he is a guest researcher at the Intelligent Data Engineering Lab of the University of Amsterdam and engages in research at the intersection of data management and machine learning.