Fokko Driesprong PyData Amsterdam 2023

Fokko Driesprong
.ical

Open Source enthousiast. Committer on Avro, Parquet, Druid, Airflow and Iceberg. Apache Software Foundation members.

Sessions

09-14

15:10

30min

PyIceberg: Tipping your toes into the petabyte data-lake

Fokko Driesprong

With Apache Iceberg, you store your big data in the cloud as files (e.g., Parquet), but then query it as if it’s a plain SQL table. You enjoy the endless scalability of the cloud, without having to worry about how to store, partition, or query your data efficiently. PyIceberg is the Python implementation of Apache Iceberg that loads your Iceberg tables into PyArrow (pandas), DuckDB, or any of your preferred engines for doing data science. This means that with PyIceberg, you can tap into big data easily by only using Python. It’s time to say goodbye to the ancient Hadoop-based frameworks of the past! In this talk, you'll learn why you need Iceberg, how to use it, and why it is so fast.

Bar

Fokko Driesprong .ical

Sessions

Fokko Driesprong
.ical