PyData Amsterdam 2023

Causal Inference Libraries: What They Do, What I'd Like Them To Do
09-14, 11:20–11:50 (Europe/Amsterdam), Qux

This talk will explore the Python tooling and ecosystem for estimating conditional average treatment effects (CATEs) in a Causal Inference setting. Using real world-examples, it will compare and contrast the pros and cons of various existing libraries as well as outline desirable functionalities not currently offered by any public library.


Conditional average treatment effects (CATEs) are a fundamental concept in Causal Inference, allowing for the estimation of the effect of a particular treatment or intervention. For CATEs, the effect estimation is not only with respect to an entire population, e.g. all experiment participants, but rather with respect to units, e.g. a single experiment participant, with individual characteristics. This can be very important to meaningfully personalize services and products. In this talk, we will explore the Python tooling and ecosystem for estimating CATEs, including libraries such as EconML and CausalML.

We will begin by providing an overview of the theory behind CATE estimation, how it fits into the broader field of causal inference and how Machine Learning has recently broken into CATE estimation. We will then dive into the various libraries available for Python, discussing their strengths and weaknesses and providing real-world examples of their usage.

Specifically, we will cover:
- EconML: An open-source library for general Causal Inference purposes, by Microsoft Research
- CausalML: An open-source library for uplift modeling in particular, by Uber

We will compare and contrast these libraries with respect to CATE estimation, discussing which methods they use, which assumptions they make, and which types of data they are best suited for. We will also provide code examples to illustrate how to use each library in practice. Moreover, we will discuss what we think is missing from both of them.

By the end of the talk, attendees will have a solid understanding of the Python tooling and ecosystem for estimating CATEs in a causal inference setting. They will know which libraries to use for different types of data and which methods are most appropriate for different scenarios.

This talk could be particularly relevant for Data Scientists wishing to analyze experiments, such as A/B tests, or trying to derive causal statements from observational, non-experimental data. Participants are not expected to have Causal Inference expertise. Yet, a fundamental understanding of Machine Learning and Probability Theory will be beneficial.

0-5’: Why Causal Inference and why CATE estimation?
5-10’: What are some conceptual ways of estimating CATEs?
10-20’: How can we use EconML and CausalML for CATE estimation on a real dataset?
20-30’: What are we missing from EconML and CausalML?


Prior Knowledge Expected

No previous knowledge expected

Kevin is a Data Scientist at QuantCo, working on fraud detection, risk modelling and experimentation. Prior to joining QuantCo, he focused on Natural Language Processing, discrete optimization and Bayesian optimization during his Computer Science major at ETH, Zurich.
He's not very original in that he likes functional programming, running and writing.