Fokko Driesprong
Open Source enthousiast. Committer on Avro, Parquet, Druid, Airflow and Iceberg. Apache Software Foundation members.
Sessions
With Apache Iceberg, you store your big data in the cloud as files (e.g., Parquet), but then query it as if it’s a plain SQL table. You enjoy the endless scalability of the cloud, without having to worry about how to store, partition, or query your data efficiently. PyIceberg is the Python implementation of Apache Iceberg that loads your Iceberg tables into PyArrow (pandas), DuckDB, or any of your preferred engines for doing data science. This means that with PyIceberg, you can tap into big data easily by only using Python. It’s time to say goodbye to the ancient Hadoop-based frameworks of the past! In this talk, you'll learn why you need Iceberg, how to use it, and why it is so fast.