PyData Amsterdam 2023

Bram van den Akker

Bram van den Akker is a Senior Machine Learning Scientist at Booking.com with a background in Computer Science and Artificial Intelligence from the University of Amsterdam. At Booking.com, Bram has been one of the founders of bkng-data, an internal collection of Python tools aimed at improving code quality, testing, and streamlining CI/CD for data practitioners.

Aside from bkng-data, Bram's work focuses on bridging the gap between applied research and practical requirements for Bandit Feedback all across Booking.com. Previously, Bram has held positions at Shopify, Panasonic & Eagle Eye Networks, and has peer reviewed contributions and tutorials to conferences and workshops such as TheWebConf (WWW), RecSys, and KDD, including a best-paper award.

The speaker's profile picture

Sessions

09-14
12:00
30min
Tables as Code: The Journey from Ad-hoc Scripts to Maintainable ETL Workflows at Booking.com
Bram van den Akker, Jon Smith

Until a few years ago, data science & engineering at Booking.com had grown largely in an ad-hoc manner. This growth has led to a labyrinth of unrelated scripts representing Extract-Transform-Load (ETL) processes. Without options for quickly testing cross-application interfaces, maintenance and contribution grew unwieldy, and debugging in production was a common practice.

Over the past several years, we’ve spearheaded a transition from isolated workflows to a well-structured community-maintained monorepo - a task that required not just technical adaptation, but also a cultural shift.

Central to this transformation is the adoption of the concept of "tables as code", an approach that has changed the way we write ETL. Our lightweight PySpark extension represents table metadata as a Python class, exposing data to code, and enabling efficient unit test setup and validation.

In this talk, we walk you through “tables as code” design and complementary tools such as efficient unit testing, robust telemetry, and automated builds using Bazel. Moreover, we will cover the transformation process, including enabling people with non-engineering backgrounds to create fully tested and maintainable ETL. This includes internal training, maintainers, and support strategies aimed at fostering a community knowledgeable in best practices.

Foo (main)