PyData Amsterdam 2023

Adrin works on a few open source projects including skops which tackles some of the MLOps challenges related to scikit-learn models. He has a PhD in Bioinformatics, has worked as a consultant, and in an algorithmic privacy and fairness team. He's also a core developer of scikit-learn and fairlearn.

Let’s exploit pickle, and `skops` to the rescue!

Alessandro Cappelli

Keynote "Processing billions of tokens for training Large Language Models, tools and knowledge"

Alexander Backus

Alexander is Data Science Manager at Dexter Energy, where he is currently leading the development of machine learning-powered short-term power trading optimization products. He brings extensive hands-on machine learning engineering and data science management experience from various industries, including organizations such as KLM Royal Dutch Airlines, ING Bank, Heineken, VodafoneZiggo and IKEA.

Harnessing uncertainty: the role of probabilistic time series forecasting in the renewable energy transition

Alexandre Sajus

Alex worked in Amazon Business Intelligence. He graduated with a Master of Engineering at CentraleSupélec - Paris-Saclay University and joined Taipy as a Community Success Consultant. His primary skills are MLOps, Machine Learning, Data Engineering, and Python.

Turning your Data/AI algorithms into full web applications in no time with Taipy

Alon Nir

Data scientist (Data Lead) at Spotify. Dismal scientist by education. Advocating against pie charts since 2015. Self-proclaimed GIF connoisseur.

Power Users, Long Tail Users, and Everything In Between: Choosing Meaningful Metrics and KPIs for Product Strategy

Alyona Galyeva

Alyona Galyeva is an organizer of PyLadies Amsterdam, co-organizer of MLOps and Crafts, Microsoft AI MVP and Principal Engineer at Thoughtworks
Observe - Optimize - Learn - Repeat
Passionate about encouraging others to see different perspectives and constructively break the rules.
I found my joy in building, optimizing, and deploying end-to-end AI and Data Engineering Solutions.

Data Contracts in action powered by Python open source ecosystem

Ana Chaloska

Ana is a data scientist experienced in the payments industry with a focus on the risk domain. With background in information and data science, she has contributed to building ML solutions for mitigating customer risk and optimizing customer monitoring processes.

To One-Hot or Not: A guide to feature encoding and when to use what

Andy Kitchen

I've helped found multiple start-ups, including CorticalLabs an AI+Biotech company working on "Synthetic Biological Intelligence". I've co-authored several papers and patents in deep learning and neuroscience. I've made a mess in more than a dozen programming languages over my career. My stack is full. I've worked on custom neural interface hardware to web apps and everything in between. I've won a few hack-a-thons. I started the Machine Learning and AI meetup in Melbourne Australia, helped found & organize the Compose :: Melbourne conference. I have two cats, I scoop their poop most days.

Promptly Evaluating Prompts with Bayesian Tournaments

Arliss Collins

Arliss is the Developer Advocate at NumFOCUS working
closely 60+ open-source projects in the scientific computing
ecosystem.to assess and support their needs related to
sustainability, governance, community building, and more.

Open source, DEI, data, and problem-solving are her
passions.

In a past life, Arliss was an engineer and data scientist with a
focus on geophysics before beginning her foray into open
source through the role of data analyst at the Mozilla
Foundation, Science Lab.

Unconference: How to Host a DEI Unconference

Azamat Omuraliev

Azamat Omuraliev is a Senior Data Scientist at ING. Cracking the problem of personalization since joining ING in 2020! Decided to stay on this topic because it’s a challenge that requires getting many things right: constructing the right kind of machine learning model, staying in touch with customers and handling millions of interactions daily. Thanks to that, still learning something new on the job every single day.

Originally from Kyrgyzstan, moved to the Netherlands for studies but stayed for friends and for Amsterdam ❤️

Building true Machine Learning MVPs: Validating the value chain as a product data scientist

Benjamin Bossan

I worked as a Data Scientist and Head of Data Science for a couple of ears, now I'm Machine Learning Engineer at Hugging Face. I'm also a maintainer of the skorch package (https://github.com/skorch-dev/skorch).

Extend your scikit-learn workflow with Hugging Face and skorch

Bram van den Akker

Bram van den Akker is a Senior Machine Learning Scientist at Booking.com with a background in Computer Science and Artificial Intelligence from the University of Amsterdam. At Booking.com, Bram has been one of the founders of bkng-data, an internal collection of Python tools aimed at improving code quality, testing, and streamlining CI/CD for data practitioners.

Aside from bkng-data, Bram's work focuses on bridging the gap between applied research and practical requirements for Bandit Feedback all across Booking.com. Previously, Bram has held positions at Shopify, Panasonic & Eagle Eye Networks, and has peer reviewed contributions and tutorials to conferences and workshops such as TheWebConf (WWW), RecSys, and KDD, including a best-paper award.

Tables as Code: The Journey from Ad-hoc Scripts to Maintainable ETL Workflows at Booking.com

Busra Cikla

Busra is an experienced data scientist with passion for analytics at ING’s Risk & Pricing Advanced Analytics Team in Amsterdam. She has designed and developed end-to-end advanced analytics solutions to a business problem in different domains during the last 5 years at ING. Currently, she is working on real-time credit risk models by using ML. Busra has a background on optimisation and operational research from her B.Sc. study and she has M.Sc. degree on Data Science.

Transfer Learning in Boosting Models

Cheuk Ting Ho

Before working in Developer Relations, Cheuk has been a Data Scientist in various companies which demands high numerical and programmatical skills, especially in Python. To follow her passion for the tech community, Cheuk is now the Developer Advocate. Cheuk also contributes to multiple Open Source libraries like Hypothesis, Pandas and Django.

Besides her work, Cheuk enjoys talking about Python on personal streaming platforms and podcasts. Cheuk has also been a speaker at Universities and various conferences. Besides speaking at conferences, Cheuk also organises events for developers. Conferences that Cheuk has organized include EuroPython (which she is a board member), PyData Global and Pyjamas Conf. Believing in Tech Diversity and Inclusion, Cheuk constantly organizes workshops and mentored sprints for minority groups. In 2021, Cheuk has become a Python Software Foundation fellow.

Generating Data Frames for your test - using Pandas stratgies in Hypothesis

Cor Zuurmond

Cor improves business processes with data.

With a background in physics and an MSc in data science, he is well-versed in various tools and practices. Cor understands, on a fundamental level, the techniques that he applies. He converts questions into automated and optimized processes. Simply put: Cor believes that actions lead to insights more often than insights lead to actions.

After laying out a solid data engineering foundation, Cor applies AI techniques to give every project or product an unrivaled edge. He loves to automate solutions to turn your most daunting business challenges into a walk in the park.

Minimizing the Data Mesh Mess

Danial Senejohnny

I am a data scientist with a background in applied mathematics (systems & control). In my career as data scientist, I have experienced different sectors, i.e. manufacturing, cybersecurity, healthcare, and finance. Currently, I am contributing to data-driven solutions that improve our clients’ experience and satisfaction within ABN AMRO.

Survival Analysis: a deep dive

Daniel van der Ende

Daniel van der Ende is a Data Engineer at Xebia Data. He enjoys working on high performance distributed computation with Spark, empowering data scientists by helping them to run their models on very large datasets with high performance. He is an Apache Spark and Apache Airflow contributor and speaker at conferences and meetups.

Return to Data's Inferno: are the 7 layers of data testing hell still relevant?

Dror A. Guldin

Data Scientist (Tech Lead) at Meta

Power Users, Long Tail Users, and Everything In Between: Choosing Meaningful Metrics and KPIs for Product Strategy

Emeli Dral

Emeli Dral is a Co-founder and CTO at Evidently AI, a startup developing open-source tools to evaluate, test, and monitor the performance of machine learning models.

Earlier, she co-founded an industrial AI startup and served as the Chief Data Scientist at Yandex Data Factory. She led over 50 applied ML projects for various industries - from banking to manufacturing. Emeli is a data science lecturer at GSOM SpBU and Harbour.Space University. She is a co-author of the Machine Learning and Data Analysis curriculum at Coursera with over 100,000 students.

Mind the language: how to monitor NLP and LLM in production

Fabio Buso

Fabio Buso is VP of Engineering at Hopsworks, leading the Feature Store development team. Fabio holds a master’s degree in Cloud Computing and Services with a focus on data intensive applications.

MLOps on the fly: Optimizing a feature store with DuckDB and ArrowFlight

Felipe Moraes

I am a machine learning scientist at Booking.com working on personalized discounts under budget constraints.
I have a PhD in Computer Science from the Delft University of Technology. During my PhD, I interned as an applied scientist at Amazon Alexa Shopping, where I worked on finding proxies for what customers find relevant when comparing products during their search shopping journey in order to empower Amazon recommendation systems. Before that I obtained a BSc and MSc in Computer Science from the Federal University of Minas Gerais, visited research labs at NYU and the University of Quebec, and worked as a software engineer intern in a news recommendation system start up.

Enhancing Economic Outcomes: Leveraging Business Metrics for Machine Learning Model Optimization

Feng Zhao

Feng is a senior data scientist at Adyen. He is passionate about solving real business problems using innovative AI/machine learning approaches. He received his Ph.D. from the National University of Singapore.

Graph Neural Networks for Real World Fraud Detection

Florian Jacta

Specialist of Taipy, a low-code open-source Python package enabling Python developers to develop a production-ready AI application quickly. Package pre-sales and after-sales function.
Data Scientist for Groupe Les Mousquetaires (Intermarche) and ATOS.
Developed several Predictive Models as part of strategic AI projects.
Master in Applied Mathematics from INSA, Major in Data Science and Mathematical Optimization.

Turning your Data/AI algorithms into full web applications in no time with Taipy

Fokko Driesprong

Open Source enthousiast. Committer on Avro, Parquet, Druid, Airflow and Iceberg. Apache Software Foundation members.

PyIceberg: Tipping your toes into the petabyte data-lake

Francesco Bruzzesi

Data scientist at HelloFresh with a background in pure mathematics.
Open source enthusiast and ML practitioner.

Bayesian ranking for tennis players in PyMC

Guilherme Penedo

Keynote "Processing billions of tokens for training Large Language Models, tools and knowledge"

Hadi Abdi Khojasteh

Hadi is an R&D senior machine learning engineer at the Deltatre group, where he is an integral member of the innovation lab and a fellow at the Sport Experiences unit, based in Czechia and Italy. With a solid academic background, Hadi is a former lecturer at the Institute for Advanced Studies in Basic Sciences (IASBS) in Iran and as a researcher at the Institute of Formal and Applied Linguistics (ÚFAL) at Charles University in Prague. Throughout his career, he has actively participated in numerous industrial projects, collaborating closely with renowned experts in the fields of CV/NLP/HLT/CL/ML/DL. His research focuses on multimodal learning inspired by neural models that are both linguistically motivated and tailored to language and vision, visual reasoning and deep learning. His main research interests are Machine Learning, Deep Learning, Computer Vision, Multimodal Learning and Visual Reasoning while he is experienced in a wide variety of international projects on cutting-edge technologies.

Distillation Unleashed: Domain Knowledge Transfer with Compact Neural Networks

Hannes Mühleisen

Prof. Dr. Hannes Mühleisen is a creator of the DuckDB database management system and Co-founder and CEO of DuckDB Labs, a consulting company providing services around DuckDB. He is also a senior researcher of the Database Architectures group at the Centrum Wiskunde & Informatica (CWI), the Dutch national research lab for Mathematics and Computer Science in Amsterdam. Hannes is also Professor of Data Engineering at Radboud Universiteit Nijmegen. His' main interest is analytical data management systems.

In-Process Analytical Data Management with DuckDB

Ines Montani

Ines Montani is a developer specializing in tools for AI and NLP technology. She’s the co-founder and CEO of Explosion and a core developer of spaCy, a popular open-source library for Natural Language Processing in Python, and Prodigy, a modern annotation tool for creating training data for machine learning models.

There are no bad labels, only happy accidents

Jakob Willisch

As Head of Product Data at Babbel, I lead data-scientists, analysts and engineers to improve decision-making of people and machines. Before joining Babbel I did quantitative research in Political Science and Political Economy.

The proof of the pudding is in the (way of) eating: quasi-experimental methods of causal inference and their practical pitfalls

James Powell

Cumulative Index Max in pandas

Jeroen Rombouts

Jeroen is an expert in machine learning and AI, specializing in transforming ideas and proof-of-concepts into value-driven products. Leveraging deep expertise in data science and engineering, he offers practical solutions to enhance machine learning infrastructure and elevate data teams' AI skills.

From Vision to Action: Designing and Deploying Effective Computer Vision Pipelines

Jetze Schuurmans

Jetze is a well-rounded Machine Learning Engineer, who is as comfortable solving Data Science use cases as he is productionizing them in the cloud. His expertise includes MLOps, Systems Design and Cloud Engineering. In his past role as a researcher, he has published papers in different domains: Computer Vision and Natural Language Processing.

Designing a Machine Learning System

Jon Smith

Jon Smith is a Senior Machine Learning Scientist at Booking.com, having spent his time working in fraud detection and performance marketing. In these areas, he focusses on strengthening software practices within critical ML systems, through evangelising code quality and unit testing.

He studied Mathematics and Computer Science at Acadia University and Simon Fraser University in Canada, and spent some time as a Machine Learning Engineer at the Canadian Broadcasting Corporation.

Tables as Code: The Journey from Ad-hoc Scripts to Maintainable ETL Workflows at Booking.com

Jordi Smit

Hi! My name is Jordi Smit. I’m deeply passionate about software engineering, data science, and automation. Nothing makes me happier than creating software that helps humans by automating a tedious and manual-intensive part of their job. Therefore, I love discussing data science since this field has opened the door to many new kinds of automation. However, data science solutions often stay stuck at the proof of concept level. To combat this issue, you also need software engineering knowledge. That is why I love the intersection between software engineering, data science, and automation.

I work as a Machine Learning Engineer at Xebia Data in Amsterdam. Here, I help companies to transform their ML-based models into production-ready applications. I love this job because it allows me to explore the intersection between software engineering and data science daily.

LLM Agents 101: How I Gave ChatGPT Access to My To-Do List

Joris Van den Bossche

I am a core contributor to Pandas and Apache Arrow, and maintainer of GeoPandas. I did a PhD at Ghent University and VITO in air quality research and worked at the Paris-Saclay Center for Data Science. Currently, I work at Voltron Data, contributing to Apache Arrow, and am a freelance teacher of python (pandas) at Ghent University.

What the PDEP? An overview of some upcoming pandas changes

Jorrick Sleijster

Jorrick is a Data Platform Engineer at Adyen. With a background in computer science his focus has been on introducing and maintaining tools on the data platform. On the side Jorrick is an active open-source contributor to pet projects and Apache Airflow. One of the contributions was awarded with PR-of-the-month of the Apache Airflow project.

Achieving developer autonomy on on-premise data clusters using Kubernetes.

Judith van Stegeren

Judith van Stegeren is a researcher, engineer and consultant. Together with Yorick van Pelt, she runs Datakami, a consulting firm specialized in large language models.

Previously, she worked as a machine learning engineer at Floryn, a Dutch fintech startup that finances small and medium businesses. Before that, she researched text generation at the University of Twente and worked as an information security specialist at the Dutch National Cyber Security Centre in the Hague.

Judith holds a PhD in computer science from the University of Twente. Her interests include investing, natural language processing, procedural art and video games.

Building a personal search engine with llama-index

Julien Launay

Keynote "Processing billions of tokens for training Large Language Models, tools and knowledge"

Jurriaan Nagelkerke

Data Scientist with 15+ years experience in getting value out of data for various companies in different branches. Love to apply the right ML/ AI techniques to answer business questions and actually make a difference. Aside from hands on consultant i'm also trainer in various ML techniques. Last few years strong focus on textual data / NLP and transformer models / LLMs.

Revealing the True Motives of News Readers

Katharine Jarmul

Katharine Jarmul is a privacy activist and data scientist whose work and research focuses on privacy and security in data science workflows. She recently authored Practical Data Privacy for O'Reilly and works as a Principal Data Scientist at Thoughtworks. Katharine has held numerous leadership and independent contributor roles at large companies and startups in the US and Germany -- implementing data processing and machine learning systems with privacy and security built in and developing forward-looking, privacy-first data strategy.

Keynote "AI Without Dystopia"

Katharine Jarmul

Encrypted Computation: What if decryption wasn't needed?

Kemal Tugrul Yesilbek

Kemal is a Technical Lead in a data science team at ABN AMRO. He studied software engineering and machine learning. During his time in academia, he published machine learning solutions approaching human-level performance. Kemal started his career as a data scientist. He founded Elify.io, a skill assessment tool for data-driven roles, which resulted in an exit. He worked as a machine learning engineer in the past years, delivering end-to-end machine learning-backed solutions. At his current role, he is the technical lead of CISO data science team of ABN AMRO, keeping the attackers at the bay with machine learning!

Unconference: Interviews: Tips and Stories from Both Sides

Kevin Klein

Kevin is a Data Scientist at QuantCo, working on fraud detection, risk modelling and experimentation. Prior to joining QuantCo, he focused on Natural Language Processing, discrete optimization and Bayesian optimization during his Computer Science major at ETH, Zurich.
He's not very original in that he likes functional programming, running and writing.

Causal Inference Libraries: What They Do, What I'd Like Them To Do

Krishi Sharma

Krishi Sharma is a software developer at KUNGFU.AI where she builds software applications that power machine learning models and deliver data for a broad range of services. As a former data scientist and machine learning engineer, she is passionate about building tools that ease the infrastructure dependencies and reduce potential technical debt around handling data. She helped build and maintains an internal Python tool, Potluck, which allows machine learning engineers the ability to bootstrap a containerized, production ready application with data pipelining templates so that her team can focus on the data and metrics without squandering too much time finagling with deployment and software

Innovation in the Age of Regulation: Federated Learning with Flower

Laura Summers

Laura is a Design Engineer and Prodigy Teams Product Lead at Explosion AI.

She is the founder of Debias AI, (debias.ai) and the human behind Sweet Summer Child Score (summerchild.dev), Ethics Litmus Tests (ethical-litmus.site), fairXiv (fairxiv.org), the Melbourne Fair ML reading group (groups.io/g/fair-ml). Laura is passionate about feminism, digital rights and designing for privacy. She speaks, writes and runs workshops at the intersection of design and technology.

Ok, doomer

Lieke Kools

Lieke is lead data scientist at Sensorfact, a company aiming to eliminate all industrial energy waste for SME’s. In her role she focusses on the data fueled products that help their consultants to efficiently and effectively give advice to customers. Before joining Sensorfact she worked as a data science consultant at Vantage AI and completed a PhD in econometrics.

Standby detection with a human in the loop

Maarten Oude Rikmanspoel

I love working with both technology and people. Currently working as a freelance data engineer and business intelligence specialist to satisfy the tech part of my heart. Fell in love with Python and the PyData modules in 2017 after unsuccessful relationships with Java and C++ in the past. Applying this in a variety of industries and companies.

In parallel, I’m creating CalmCode.nl for the past 1,5 years with the aim of guiding software developers, IT- and data specialists towards less stress and burnouts. I’ve seen to many bad examples in the larger companies and multi-nationals where developers almost looked as being oppressed instead of being able to do their work properly and in a nice environment. So the people-oriented part of my heart get’s fuelled when I see people grow and being able to take control of their lives again.

“We’re all just walking each other home.” Ram Dass

import full-focus as ff – How to reduce stress and pressure as a data specialist.

Maarten Sukel

Maarten is a Data Scientist working at Picnic Technologies working mostly on Demand Forecasting and running machine learning at scale. Meanwhile at the University of Amsterdam, he works on research into the use of multimodal approaches for a range of applications.

Multimodal Product Demand Forecasting: From pixels on your screen to a meal on your plate

Mark Raasveldt

CTO at DuckDB Labs

In-Process Analytical Data Management with DuckDB

Maryam Miradi

Maryam Miradi is AI and Data Science Lead at Transactie Monitoring Nederland (TMNL). She has a PhD in Artificial Intelligence Deep Learning, specialised in NLP and Computer Vision from Delft University of Technology. The last 15 years, she has developed different AI solutions for Organisations such as Ahold-Delhaize, Belastingdienst, Alliander, Stedin and ABN AMRO

Deep look into Deepfakes: Mastering Creation, Impact, and Detection

Maël Deschamps

Manager Machine Learning Engineer, I lead the MLOps Expertise in a team of 20+ Data Engineers & Data Scientist. During my time between Shanghai and Amsterdam I explored 15+ project for 10+ clients working in various industries.

I find great joy in making both my teams and clients happy. I believe in management through empathy and transparency and I'm passionate about Data Sustainability and all its related technical challenges.
Feel free to reach-out to discuss any of those topics.

Our journey using data and AI to help monitor wildlife in parks in Africa

Nazli M. Alagoz

I am a quantitative researcher and data scientist with a strong background in marketing, economics, and econometrics. My focus is on using data-driven approaches to tackle complex business challenges, uncover valuable insights, and drive impactful decisions. As a Ph.D. candidate in quantitative marketing, I specialize in causal inference, machine learning, and experimental design to address cutting-edge research questions.

Staggered Difference-in-Differences in Practice: Causal Insights from the Music Industry

Niek IJzerman

Niek is a data scientist at the City of Amsterdam, part of the dedicated Urban Innovation and R&D Team. Niek is a recent graduate from the MSc AI program at the UvA and currently focusses on automated asset management in 3D using AI and Data Science.

Using AI to make Amsterdam greener, safer and more accessible

Niklas Amberg

I am a Data Scientist at REWE Group, using my background in information systems to solve complex problems with data-driven approaches. I am passionate about programming in Python and continuously seek to enhance my skills. Additionally, I am interested in exploring cloud technologies and their applications in the field of data science.

A data-driven approach for distributing scarce goods within the REWE retail supply chain

Okke van der Wal

Leading the Payments Machine Learning team at Uber working on Anomaly Detection, Personalization & Fraud Detection within the Onboarding and Checkout experiences at Uber using Contextual Bandits, Uplift Modelling & Reinforcement Learning.

Personalization at Uber scale via causal-driven machine learning

Oriol Abril Pla

Oriol (he/him/ell) is a computational statistician with a passion for open source, teaching and community building. He currently works as open source maintainer of ArviZ and PyMC, both Python libraries related to Bayesian modeling and sponsored by NumFOCUS. He is also an instructor at Intuitive Bayes and has taught a couple undergrad courses on maths and statistics as external lecturer.

He has led both virtual and in-person workshops and talks at Data Umbrella, PyDataBCN and PyData Global. He is also involved in PyMCon organization.

Uncertainty visualization with ArviZ

Panos Alexopoulos

Panos Alexopoulos has been working since 2006 at the intersection of data, semantics, and software, contributing to building intelligent systems that deliver value to business and society. Born and raised in Athens, Greece, he currently works as Head of Ontology at Textkernel, in Amsterdam, Netherlands, where he leads a team of Data Professionals in developing and delivering a large cross-lingual Knowledge Graph in the HR and Recruitment domain.

Panos holds a PhD in Knowledge Engineering and Management from National Technical University of Athens, and has published more than 60 papers at international conferences, journals and books. He is the author of the book "Semantic Modeling for Data - Avoiding Pitfalls and Breaking Dilemmas" (O'Reilly, 2020), and a regular speaker and trainer in both academic and industry venues.

Mastering Knowledge Graph Modeling with Neo4j: A Practical Tutorial

Paul Zhutovsky

Transfer Learning in Boosting Models

Ramon Perez

Hello! I'm Ramon, a data scientist, researcher, and educator living in Sydney. I currently work as a freelance data professional and was previously a Senior Product Developer at Decoded, a technology education company based in the UK. While at Decoded, I created custom data science tools, workshops, and training programs for clients in industries ranging from retail to finance. Prior to that, I held roles at the intersection of education, data science, and research in the areas of entrepreneurship and strategy, alongside a few ventures in consumer behavior and development economics research in industry and academia, respectively. On the personal side, I enjoy giving talks and technical workshops and have had the privilege of participating in several conferences such PyCon, SciPy, PyData, and countless meetup events. In my spare time, I spend as much time as possible mountain biking and exploring many of the outdoor wonders Australia has to offer.

Unlocking the Black Box: A practical guide to finding an alibi for machine learning models

Raphael de Brito Tamaki

Data Science Lead in the Marketing Science @Meta, where I use causal inference techniques to extract insights to help advertisers increase their marketing performance. Prior to joining Meta, I worked at Wildlife Studios - a mobile game studio with over 2B total downloads - where I was the Tech Lead for the Lifetime Value (LTV) prediction team, and implemented and maintained LTV models in production for over 10 games

Forecasting Customer Lifetime Value (CLTV) for Marketing Campaigns under Uncertainty with PySTAN

Riccardo Amadio

Senior Data Engineer at Agile Lab with a background of Data Scientist and Software Engineer.
When I don't work with data pipelines , I juggle between closing some of my 100+ open tabs on the browser and my true passion: collecting stars on GitHub 🔭🌟. In this treasure trove of more than 2,000 repositories, I am pretty sure I can find any tool to solve a problem, and I can’t wait to share them with you.

Declarative data manipulation pipeline with Dagster

Rik van der Vlist

Rik is a machine learning engineer with a strong foundation in electrical engineering and a specialization in leveraging electricity data for smart use cases. With previous experience at Eneco, he has focused on delivering automated home energy insights to large group of customers. Currently, Rik is dedicated to constructing a scalable forecasting model for a sustainable electricity grid, combining his passion for data science and sustainable solutions. He thrives on creating value and generating insights from raw data, demonstrating his proficiency in building robust and scalable data pipelines using Spark and Python.

Balancing the electricity grid with multi-level forecasting models

Ritchie Vink

Ritchie Vink is the author of the Polars query engine/ DataFrame library and the CEO/Co-Founder of Polars the company.

Originally he has a background in Civil Engineering, but he switched fields and has most work experience in Machine learning and software development. Though what truly matters in experience is what he did in his side-projects.

Polars and a peek in the expression engine

Robert G. Erdmann

Keynote - Python for Imaging and Artificial Intelligence in Cultural Heritage

Robert Grimm

Data Scientist at REWE Group. Previously worked as Data Scientist in the chemical industry (Covestro AG). Prior to working in industry, earned a PhD in Computational Psycholinguistics (University of Antwerp).

A data-driven approach for distributing scarce goods within the REWE retail supply chain

Roy van Santen

I'm a Software Engineer who gradually moved into the Machine Learning space. My passion is in building high quality and highly maintainable software systems. I put great effort in CI/CD, layered testing, monitoring/alerting and put special attention to communicating my ideas ensuring everyone is onboard.
To do this I like using RFC-like process using design docs that serves as documentation for designing software systems.

Designing a Machine Learning System

Shayla Jansen

Shayla is a data scientist at the City of Amsterdam, part of the dedicated Urban Innovation and R&D Team which aims to improve the livability of Amsterdam by bringing AI research to the city.

Using AI to make Amsterdam greener, safer and more accessible

Simone Gayed Said

Hello Hello! 🌟 I'm Simone, I work as a Machine Learning engineer, and I'm all about using my skills to make a positive impact on the World! 🚀✨

Our journey using data and AI to help monitor wildlife in parks in Africa

Thomas Wolf

Thomas Wolf is a co-founder and Chief Science Officer at Hugging Face. He is passionate about creating open-source software that makes complex research accessible, and most proud of creating the Transformers and Datasets libraries as well as the Magic-Sand tool. When he’s not building OSS, he pushes for open-science in research in AI/ML, trying to lower the gap between academia and industrial labs. His current research interests are centered around overcoming the current limitations of LLMs with multi-modalities and complementary approaches.

Keynote "Processing billions of tokens for training Large Language Models, tools and knowledge"

Till Döhmen

Till Döhmen is a Research Engineer at Hopsworks, where he is contribibuting to the development of Hopswork's Python-centric Feature Store platform. In addition to his work at Hopsworks, he is a guest researcher at the Intelligent Data Engineering Lab of the University of Amsterdam and engages in research at the intersection of data management and machine learning.

MLOps on the fly: Optimizing a feature store with DuckDB and ArrowFlight

Tingting Qiao

Senior data scientist in Adyen, working in the Score team focusing on fraud detection.

Having PhD background in computer vision and natural language processing using deep neural networks. Familiar with prediction models, such as regression, classification models, etc., as well as the latest research techniques, such as adversarial learning, neural networks etc. Several years of experience with popular deep learning frameworks.

Graph Neural Networks for Real World Fraud Detection

Tony Bagnall

Tony is a Professor of Computer Science at the University of East Anglia, where he leads the time series machine learning group. His primary research interest is in time series machine learning, with a historic focus on classification, but more recently looking at clustering and regression. He has a side interest in ensemble design.

Lets do the time warp again: time series machine learning with distance functions

Vicki Boykis

Vicki Boykis works on end-to-end ML applications. Her interests include the intersection of information retrieval and large language models, applying engineering best practices to machine learning, and Nutella. She works at Duo Security and she lives in Philadelphia with her family. Her favorite hobby was making terrible jokes on Twitter when it was still good. She recently wrote a deep dive on embeddings and put together Normconf, celebrating normcore workflows in ML.

Keynote "Build and keep your context window"

Vincent Smeets

Hi, my name is Vincent Smeets. I am one of the data scientists within the Data And Customer Analytics department at DPG Media. I am responsible for generating insights from structured and semi-structured data to support decision making within the B2C Marketing organisation. In my freetime I love skateboarding, tennis and running.

Revealing the True Motives of News Readers

Vincent Warmerdam

Vincent D. Warmerdam is a software developer and senior data person. He’s currently works over at Explosion to work on data quality tools for developers. He’s also known for creating calmcode.io as well as a bunch of open source projects. You can check out his blog over at koaning.io to learn more about those.

Keynote "Natural Intelligence is All You Need [tm]"

Vincent Warmerdam

There are no bad labels, only happy accidents

Wesley Boelrijk

Wesley is the Lead Machine Learning Engineer at Xccelerated (part of Xebia). There, he trains and guides junior-to-medior ML Engineers in Xccelerated's one-year program. Besides that, he works as an MLE on various projects, recently at KLM, ProRail, and Port of Rotterdam. In his free time, he likes to stay up-to-date in the ML ecosystem and play around with computer vision.

From Vision to Action: Designing and Deploying Effective Computer Vision Pipelines

Wessel Sandtke

Typewriter repairman turned Machine Learning Engineer, now working for Bookarang, a Dutch startup working with Dutch libraries to improve the recommendations for its members.
Wrote several picture books, but is not allowed to boost those in the recommendation system.

Don’t judge a book by its cover: Using LLM created datasets to train models that detect literary features

Yorick van Pelt

Yorick is a consultant at Datakami doing generative AI for finance&creativity. He also has years of experience consulting in data science, ops and functional programming.

Building a personal search engine with llama-index

Ziad Al Moubayed

Staff Engineer @ Adyen. I am passionate about high performance distributed systems. Recently I was working on scaling Adyen's Data & ML infrastructure.

Reliable and Scalable ML Serving: Best Practices for Online Model Deployment

ildar safilo

Machine Learning Scientist in the Booking.com
Experienced manager in MLE/DS/SE/DA, I possess extensive expertise in machine learning, analytics, and software engineering. I excel at leading teams to create groundbreaking businesses and delivering innovative solutions for real-world business cases across various industries, including IT, banking, telecommunications, marketplaces, game development, shops, Travel-tech and streaming platforms.
Expert in building recommendation and ranking systems, as well as personalization automation with machine learning, and advanced A/B testing.
Co-author and lecturer of a popular online course on recommender system development with over 1000 students.
Co-author an open-source Python library called RecTools, specifically designed for building recommender systems. The library is hosted on GitHub at RecTools and has received widespread recognition and adoption in the industry.
Graduate with a Master’s degree in Mathematics and Computer Science and over 6 years of experience in data science.

Mastering Recommendation Systems Evaluation: An A/B Testing Approach with Insights from the Industry

sktime community

(1 speaker will attend if accepted, exact speaker tbd)

Probabilistic predictions: probabilistic forecasting with sktime and probabilistic regression with skpro