PyData Amsterdam 2023

Generating Data Frames for your test - using Pandas stratgies in Hypothesis
09-16, 10:00–11:20 (Europe/Amsterdam), Hello, World! (Tutorials)

Do you test your data pipeline? Do you use Hypothesis? In this workshop, we will use Hypothesis - a property-based testing framework to generate Pandas DataFrame for your tests, without involving any real data.


In this short 90 mins workshop, we will first go through the basics of hypothesis and what is property-based testing. After that, we will introduce the strategies for Pandas objects - available via the extras in Hypothesis. We will have a glimpse of what the strategies are doing to generate the testing object, including Pandas Series and DataFrames. In the end, we will apply what we learn in real testing applications - testing a data pipeline that involves DataFrames.

workshop material can be found at this repo

Outline

  • Introduction of Property-based testing (15 mins)
  • Introduction and basic use of Hypothesis exercises (30 mins)
  • Deep dive into Pandas strategies (20 mins)
  • Do it yourself - apply property-based testing to data pipelines (20 mins)
  • Conclusion (5 mins)

Prerequisits

No prior knowledge of property-based testing or hypothesis is required. However, we assume the attendee has experience using Pandas and has a basic understanding of Pandas objects. Knowledge about Numpy array and typing would also be beneficial in understanding the Pandas Strategies.

Goal

We hope the attendee will learn about property-based testing and see how it can benefit their work involved data - especially those that use Pandas. After the workshop, attendees should be able to understand how the Pandas strategies in Hypothesis works and to use Hypotheses to test codes that involve Pandas Series or DataFrame input.


Prior Knowledge Expected

No previous knowledge expected

Before working in Developer Relations, Cheuk has been a Data Scientist in various companies which demands high numerical and programmatical skills, especially in Python. To follow her passion for the tech community, Cheuk is now the Developer Advocate. Cheuk also contributes to multiple Open Source libraries like Hypothesis, Pandas and Django.

Besides her work, Cheuk enjoys talking about Python on personal streaming platforms and podcasts. Cheuk has also been a speaker at Universities and various conferences. Besides speaking at conferences, Cheuk also organises events for developers. Conferences that Cheuk has organized include EuroPython (which she is a board member), PyData Global and Pyjamas Conf. Believing in Tech Diversity and Inclusion, Cheuk constantly organizes workshops and mentored sprints for minority groups. In 2021, Cheuk has become a Python Software Foundation fellow.