PyData Amsterdam 2023

Jon Smith

Jon Smith is a Senior Machine Learning Scientist at Booking.com, having spent his time working in fraud detection and performance marketing. In these areas, he focusses on strengthening software practices within critical ML systems, through evangelising code quality and unit testing.

He studied Mathematics and Computer Science at Acadia University and Simon Fraser University in Canada, and spent some time as a Machine Learning Engineer at the Canadian Broadcasting Corporation.

The speaker's profile picture

Sessions

09-14
12:00
30min
Tables as Code: The Journey from Ad-hoc Scripts to Maintainable ETL Workflows at Booking.com
Bram van den Akker, Jon Smith

Until a few years ago, data science & engineering at Booking.com had grown largely in an ad-hoc manner. This growth has led to a labyrinth of unrelated scripts representing Extract-Transform-Load (ETL) processes. Without options for quickly testing cross-application interfaces, maintenance and contribution grew unwieldy, and debugging in production was a common practice.

Over the past several years, we’ve spearheaded a transition from isolated workflows to a well-structured community-maintained monorepo - a task that required not just technical adaptation, but also a cultural shift.

Central to this transformation is the adoption of the concept of "tables as code", an approach that has changed the way we write ETL. Our lightweight PySpark extension represents table metadata as a Python class, exposing data to code, and enabling efficient unit test setup and validation.

In this talk, we walk you through “tables as code” design and complementary tools such as efficient unit testing, robust telemetry, and automated builds using Bazel. Moreover, we will cover the transformation process, including enabling people with non-engineering backgrounds to create fully tested and maintainable ETL. This includes internal training, maintainers, and support strategies aimed at fostering a community knowledgeable in best practices.

Foo (main)