09-15, 13:40–14:10 (Europe/Amsterdam), Qux
Have you ever struggled with a multitude of columns created by One Hot Encoder? Or decided to look beyond it, but found it hard to decide which feature encoder would be a good replacement?
Good news, there are many encoding techniques that have been developed to address different types of categorical data. This talk will provide an overview on various encoding methods available in data science, and a guidance on decision making about which one is appropriate for the data at hand.
Join this talk if you would like to hear about the importance of feature encoding and why it is important to not default to One Hot Encoding in every scenario. It will start with commonly used approaches and will progress into more advanced and powerful techniques which can help extract meaningful information from the data.
For each presented encoder, after this talk you will know:
- When to use it
- When NOT to use it
- Important considerations specific to the encoder
- Python library that offers a built-in method with the encoder, facilitating easy integration into feature engineering pipelines.
I will explore different feature encoding approaches and provide guidance for decision-making. I will cover simpler methods like Label, One Hot, and Frequency encoding, progressing to powerful techniques like Target and Rare Label encoding. Finally, I will explain more complex approaches like Weight of Evidence, Hash and Catboost encoding. I will close the talk with summarizing the key takeaways.
Target Audience:
Data scientists and anyone interested in feature encoding
Previous experience with feature encoders can be useful but is not mandatory to follow the talk.
No previous knowledge expected
Ana is a data scientist experienced in the payments industry with a focus on the risk domain. With background in information and data science, she has contributed to building ML solutions for mitigating customer risk and optimizing customer monitoring processes.