PyData Amsterdam 2023

Achieving developer autonomy on on-premise data clusters using Kubernetes.
09-15, 10:50–11:20 (Europe/Amsterdam), Qux

Maintaining on-premise clusters poses quite a few challenges. One of these challenges is achieving developer autonomy, where developers can deploy applications themselves. This talk will cover how we set up Kubernetes to achieve exactly that.


As your datasets are growing, and you gain more use-cases, so do the number of required tools and applications. Where in the past a data cluster consisted of just HDFS, Spark, Airflow and Postgres, you now need OLAP databases, distributed query engines, parallel-computing for your model training and much more. All of this puts a lot of pressure on the infrastructure team responsible to install & maintain all the tools on your platform. By introducing Kubernetes, we change that responsibility to just maintaining HDFS and Kubernetes, and move the responsibility of maintaining and introducing the data tools to the data (platform) engineers.

In this talk we will cover how we achieved developer autonomy by touching the following subjects;
- What is the first step to installing Kubernetes on premise?
- How do we deploy changes automatically?
- How do we make an experimentation friendly environment for developers while remaining secure?
- How do we handle secrets to connect different applications together? A
- Finally, some lessons learned from the migration process.

Recommended prior knowledge; It helps if you know what Kubernetes is


Prior Knowledge Expected

Previous knowledge expected

Jorrick is a Data Platform Engineer at Adyen. With a background in computer science his focus has been on introducing and maintaining tools on the data platform. On the side Jorrick is an active open-source contributor to pet projects and Apache Airflow. One of the contributions was awarded with PR-of-the-month of the Apache Airflow project.