PyData Amsterdam 2023

Declarative data manipulation pipeline with Dagster
09-14, 14:10–14:40 (Europe/Amsterdam), Bar

Bored of old pipeline orchestrator? Difficult to understand if data is up-to-date? Trouble with development workflow of data pipeline?
Dagster, an open-source tool, offers a unique paradigm that simplifies the orchestration and management of data pipelines.
By adopting declarative principles, data engineers and data scientists can build scalable, maintainable, and reliable pipelines effortlessly.
We will commence with an introduction to Dagster, covering its fundamental concepts to ensure a comprehensive understanding of the material.
Subsequently, we will explore practical scenarios and use cases, with also DBT for empower the power of SQL language.

Minutes 0-5: Explain the design pattern problem of actual data pipeline framework.
Minutes 5-15: Introduction to Dagster and its core concepts.
Minutes 10-25: Practical examples of building declarative data pipelines with Dagster, with also DBT, the power of gRPC server.
Minutes 25-30: Q&A and conclusion.


Are you tired of struggling with outdated pipeline orchestrators? Do you find it challenging to ensure your data is always up-to-date? Are you facing difficulties with the development workflow of your data pipeline?

In this session, we will introduce Dagster, an open-source tool that revolutionizes the orchestration and management of data pipelines. By embracing declarative principles, data engineers and data scientists can effortlessly build scalable, maintainable, and reliable pipelines.

We will begin by providing an overview of the design pattern problem that many existing data pipeline frameworks face. Understanding the limitations of these frameworks will set the stage for exploring the transformative capabilities of Dagster

Next, we will delve into the core concepts of Dagster, ensuring a comprehensive understanding of the material. You will learn how Dagster simplifies pipeline development and execution by providing a declarative and intuitive approach. Through practical examples and hands-on demonstrations, we will showcase how you can leverage Dagster to build powerful data pipelines.

But that's not all! We will also explore the integration of DBT, empowering you to harness the full potential of the SQL language within your data pipelines. You will witness the synergy between Dagster and DBT, unlocking new possibilities for data manipulation and transformation.

By the end, you'll be equipped with the knowledge and inspiration to elevate your data pipeline workflows to new heights.

Outline:

Minutes 0-5: Understanding the design pattern problem of existing data pipeline frameworks
Minutes 5-15: Introduction to Dagster and its core concepts
Minutes 10-25: Practical examples of building declarative data pipelines with Dagster, including the integration with DBT and the power of gRPC server
Minutes 25-30: Q&A and conclusion


Prior Knowledge Expected –

Previous knowledge expected

Senior Data Engineer at Agile Lab with a background of Data Scientist and Software Engineer.
When I don't work with data pipelines , I juggle between closing some of my 100+ open tabs on the browser and my true passion: collecting stars on GitHub πŸ”­πŸŒŸ. In this treasure trove of more than 2,000 repositories, I am pretty sure I can find any tool to solve a problem, and I can’t wait to share them with you.