Ana Chaloska PyData Amsterdam 2023

Ana Chaloska
.ical

Ana is a data scientist experienced in the payments industry with a focus on the risk domain. With background in information and data science, she has contributed to building ML solutions for mitigating customer risk and optimizing customer monitoring processes.

Sessions

09-15

13:40

30min

To One-Hot or Not: A guide to feature encoding and when to use what

Ana Chaloska

Have you ever struggled with a multitude of columns created by One Hot Encoder? Or decided to look beyond it, but found it hard to decide which feature encoder would be a good replacement?

Good news, there are many encoding techniques that have been developed to address different types of categorical data. This talk will provide an overview on various encoding methods available in data science, and a guidance on decision making about which one is appropriate for the data at hand.

Join this talk if you would like to hear about the importance of feature encoding and why it is important to not default to One Hot Encoding in every scenario. It will start with commonly used approaches and will progress into more advanced and powerful techniques which can help extract meaningful information from the data.

For each presented encoder, after this talk you will know:
- When to use it
- When NOT to use it
- Important considerations specific to the encoder
- Python library that offers a built-in method with the encoder, facilitating easy integration into feature engineering pipelines.

Qux

Ana Chaloska .ical

Sessions

Ana Chaloska
.ical