PyData Amsterdam 2023
What can we learn from engineering, the history of machine learning, fantasy books, the early 1990s internet, and art history about how to be successful engineers in the modern-day data landscape ? We’ll learn together in this talk.
Are you a machine learning practitioner struggling with designing, reasoning, and communicating about ML systems? Then this session is for you! With the industry moving towards end-to-end ML teams to enable them to implement MLOps practices, it is paramount for you to understand ML from a systems perspective. In this hands-on session, you will gain a thorough understanding of the technical intricacies of designing valuable, reliable and scalable ML systems.
Discover how to bridge the gap between traditional machine learning and the rapidly evolving world of AI with skorch. This package integrates the Hugging Face ecosystem while adhering to the familiar scikit-learn API. We will explore fine-turing of pre-trained models, creating our own tokenizers, accelerating model training, and leveraging Large Language Models.
In this talk, we discuss how we can use the python package PySTAN to estimate the Lifetime Value (LTV) of the users that can be acquired from a marketing campaign, and use this estimate to find the optimal bidding strategy when the LTV estimate itself has uncertainty. Throughout the presentation, we highlight the benefits from using Bayesian modeling to estimate LTV, and the potential pitfalls when forecasting LTV. By the end of the presentation, attendees will have a solid understanding of how to use PySTAN to estimate LTV, optimize their marketing campaign bidding strategies, and implement the best Bayesian modelling solution. All of the contents and numbers in this presentation can be found in the shared GIT
Last year, the pandas community adopted a new process for making significant changes to the library: the Pandas Enhancement Proposals, aka PDEPs. In the meantime, several of those proposals have been proposed and discussed, and some already accepted. This talk will give an overview of some of the behavioural changes you can expect as a pandas user.
This talk will explore the Python tooling and ecosystem for estimating conditional average treatment effects (CATEs) in a Causal Inference setting. Using real world-examples, it will compare and contrast the pros and cons of various existing libraries as well as outline desirable functionalities not currently offered by any public library.
In the world of computer vision, the focus is often on cutting-edge neural network architectures. However, the true impact usually lies in designing a robust system around the model to solve real-world business challenges. In this talk, we guide you through the process of building practical computer vision pipelines that leverage techniques such as segmentation, classification, and object tracking, demonstrated by our predictive maintenance application at Port of Rotterdam. Whether you're an experienced expert seeking production-worthy pipelines or a novice with a background in data science or engineering eager to dive into image and video processing, we will explore the use of open-source tools to develop and deploy computer vision applications.
Numerous packages exist within the Python open-source ecosystem for algorithm building and data visualization. However, a significant challenge persists, with over 85% of Data Science Pilots failing to transition to the production stage.
This talk introduces Taipy, an open-source Python library for front-end and back-end development. It enables Data Scientists and Python Developers to create pilots and production-ready applications for end-users.
Its syntax facilitates the creation of interactive, customizable, and multi-page dashboards with augmented Markdown. Without the need for web development expertise (no CSS or HTML), users can generate highly interactive interfaces.
Additionally, Taipy is engineered to construct robust and tailored data-driven back-end applications. Intuitive components like pipelines and data flow orchestration empower users to organize and manage data effectively. Taipy also introduces a unique Scenario Management functionality, facilitating "what-if" analysis for data scientists and end-users.
Working on ML serving for couple of years we learned a lot. I would like to share a set of best practices / learnings with the community
The Difference-in-Differences (DiD) methodology is a popular causal inference method utilized by leading tech firms such as Microsoft Research, LinkedIn, Meta, and Uber. Yet recent studies suggest that traditional DiD methods may have significant limitations when treatment timings differ. An effective alternative is the implementation of the staggered DiD design. We exemplify this by investigating an interesting question in the music industry: Does featuring a song in TV shows influence its popularity, and are there specific factors that could moderate this impact?
Until a few years ago, data science & engineering at Booking.com had grown largely in an ad-hoc manner. This growth has led to a labyrinth of unrelated scripts representing Extract-Transform-Load (ETL) processes. Without options for quickly testing cross-application interfaces, maintenance and contribution grew unwieldy, and debugging in production was a common practice.
Over the past several years, we’ve spearheaded a transition from isolated workflows to a well-structured community-maintained monorepo - a task that required not just technical adaptation, but also a cultural shift.
Central to this transformation is the adoption of the concept of "tables as code", an approach that has changed the way we write ETL. Our lightweight PySpark extension represents table metadata as a Python class, exposing data to code, and enabling efficient unit test setup and validation.
In this talk, we walk you through “tables as code” design and complementary tools such as efficient unit testing, robust telemetry, and automated builds using Bazel. Moreover, we will cover the transformation process, including enabling people with non-engineering backgrounds to create fully tested and maintainable ETL. This includes internal training, maintainers, and support strategies aimed at fostering a community knowledgeable in best practices.
Learn how to visualize uncertainty in parameters or predictions using mutiple visualizations adapted to your data and task
Existing book recommendation systems like Goodreads are based on correlating the reading habits of people. But what if you want a humorous book? Or a book that is set in 19th century Paris? Or a thriller, but without violence?
We build book recommendation systems for Dutch libraries based on more than a dozen features from historical setting, to writing style, to main character characteristics. This allows us to tailor each recommendation to individual readers.
Fraud is a major problem for financial services companies. As fraudsters change tactics, our detection methods need to get smarter. Graph neural networks (GNNs) are a promising model to improve detection performance. Unlike traditional machine learning models or rule-based engines, GNNs can effectively learn from subtle relationships by aggregating neighborhood information in the financial transaction networks. However, it remains a challenge to adopt this new approach in production.
The goal of this talk is to share best practices for building a production ready GNN solution and hopefully spark your interest to apply GNNs to your own use cases.
DuckDB is a novel analytical data management system. DuckDB supports complex queries, has no external dependencies, and is deeply integrated into the Python ecosystem. Because DuckDB runs in the same process, no serialization or socket communication has to occur, making data transfer virtually instantaneous. For example, DuckDB can directly query Pandas data frames faster than Pandas itself. In our talk, we will describe the user values of DuckDB, and how it can be used to improve their day-to-day lives through automatic parallelization, efficient operators and out-of-core operations.
Probabilistic predictions are predictions that include some statements about uncertainty of the prediction, e.g., prediction intervals that make statements about a likely range of values that a prediction can take.
This workshop gives an introduction on making probabilistic predictions with the sktime and skpro python packages, for forecasting and supervised regression. Both packages are sklearn-compatible, built using skbase, with composable and modular interfaces.
The presentation includes a practical primer of different types of probabilistic predictions, algorithms and estimators, and evaluation workflows, with python code examples.
Bored of old pipeline orchestrator? Difficult to understand if data is up-to-date? Trouble with development workflow of data pipeline?
Dagster, an open-source tool, offers a unique paradigm that simplifies the orchestration and management of data pipelines.
By adopting declarative principles, data engineers and data scientists can build scalable, maintainable, and reliable pipelines effortlessly.
We will commence with an introduction to Dagster, covering its fundamental concepts to ensure a comprehensive understanding of the material.
Subsequently, we will explore practical scenarios and use cases, with also DBT for empower the power of SQL language.
Minutes 0-5: Explain the design pattern problem of actual data pipeline framework.
Minutes 5-15: Introduction to Dagster and its core concepts.
Minutes 10-25: Practical examples of building declarative data pipelines with Dagster, with also DBT, the power of gRPC server.
Minutes 25-30: Q&A and conclusion.
How can you evaluate your production models when the data is not structured and you have no labels? To start, by tracking patterns and changes in the input data and model outputs. In this talk, I will give an overview of the possible approaches to monitor NLP and LLM models: from embedding drift detection to using regular expressions.
Pick your next hot LLM prompt using a Bayesian tournament! Get a quick LLM dopamine hit with a side of decision theory vegetables. It's Bayesian Thunderdome: many prompts enter, one prompt leaves.
Data scientists in industry often have to wear many hats. They must navigate statistical validity, business acumen and strategic thinking, while also representing the end user. In this talk, we will talk about the pillars that make a metric the right one for a job, and how to choose appropriate Key Performance Indicators (KPIs) to drive product success and strategic gains.
With Apache Iceberg, you store your big data in the cloud as files (e.g., Parquet), but then query it as if it’s a plain SQL table. You enjoy the endless scalability of the cloud, without having to worry about how to store, partition, or query your data efficiently. PyIceberg is the Python implementation of Apache Iceberg that loads your Iceberg tables into PyArrow (pandas), DuckDB, or any of your preferred engines for doing data science. This means that with PyIceberg, you can tap into big data easily by only using Python. It’s time to say goodbye to the ancient Hadoop-based frameworks of the past! In this talk, you'll learn why you need Iceberg, how to use it, and why it is so fast.
Every news consumer has needs and in order to build a true bond with your customer it is vital to meet these, sometimes, diverse needs. To achieve this, first of all, it is important to identify the overarching needs of users; the reason why they read news. The BBC conducted research to determine these needs and identified six distinct categories: Update me, Keep me on trend, Give me perspective, Educate me, Divert me, and Inspire me. Their research showed that an equal distribution of content across these user needs will lead to higher customer engagement and loyalty. To apply this concept within DPG Media, we started building our own user needs model. Through various iterations of text labelling, text preparation, model building, fine-tuning and evaluation, we have arrived at a BERT model that is capable of determining the associated user needs based solely on the article text.
You are a data science or a machine learning engineering, and you applied for a position. You thought your interview went well, but still got a negative response... What might went wrong? In this talk, we will explore how things may go wrong from both applicant and interviewer side, and what can you do about it.
For many people, a museum is the last place they would expect to find cutting-edge data science, but the world of cultural heritage is full of fascinating challenges for imaging and computation. The availability of high-resolution imaging, high-speed internet, and modern computational tools allows us to image cultural heritage objects in staggering detail and with a wide array of techniques. The result, though, is a data deluge: studying single objects like Rembrandt's Night Watch can generate terabytes of data, and there are millions of objects in the world's museums.
The huge Python ecosystem enables us to build tools to process, analyze, and visualize these data. Examples include creating the 717 gigapixel (!) image of the Night Watch and reconstructing the painting's long-lost missing pieces using AI; controlling a camera and automated turntable in Jupyter for 3D object photography; revealing hidden watermarks in works on paper using a hybrid physics and deep learning-based ink-removal model; using chemical imaging and convolutional neural networks to see the hidden structure of Rembrandt and Vermeer paintings; and using a webcam or smartphone camera to do real-time similarity search over a database of 2.3 million open-access cultural heritage images at 4 frames per second.
These and several other live demonstrations show how Python is essential in our work to help the world access, preserve, and understand its cultural heritage.
Lorem ipsum dolor
Drinks hosted by Kickstart AI at Lowlander Botanical Bar and Kitchen [located next to the venue]. Note that there’s only limited spots available so make sure to sign up to secure your spot! Sign up via https://meetup-september14.kickstartai-events.org/
Many of us have heard terms like Data for Good, Ethical Machine Learning, Human-Centric Product Design, but those words also bring forward questions -- if we need "Ethical ML" what is the rest of machine learning? The current conversation around AI Doom paints a picture where AI goes hand-in-hand with dystopian outcomes. In this keynote, we'll explore what AI could look like if at the core, it was led by these ideals. What if distributed, communal machine learning were a central focus? What if privacy and user choice were a part of our everyday machine learning frameworks? What if aid organizations, governments, coalitions helped shape the problems for AI research? Let's ponder these questions and their outcomes together, imagining AI without the potential for dystopia.
Global political circumstances and unpredictable crises such as the Covid-19 pandemic can cause a scarcity of grocery goods within the retail supply chain. We present a data-driven approach to ensure a fair and replicable distribution from the supplier to the retail warehouses at REWE, one of the largest grocery chains in Germany.
With the rise of data privacy concerns around AI in the EU, how can we innovate using AI capabilities despite regulations around consumer data? What tools and features are available to help us build AI in regulated industries? This talk will discuss how we can leverage diverse datasets to build better AI models without ever having to touch the datasets by using a Python library called Flower.
In this talk, we outline how we introduced causality into our machine learning models within the core checkout and onboarding experiences globally, thereby strongly improving our key business metrics. We discuss case studies, where experimental data were combined with machine learning in order to create value for our users and personalize their experiences, and we share our lessons learned with the goal to inspire attendees to start incorporating causality into their machine learning solutions. Additionally, we explain how the open source Python package developed at Uber, CausalML, can help others in successfully making the transition from correlation-driven machine learning to causal-driven machine learning.
Knowledge work is undergoing a transformative journey with machine learning (ML) but the interpretability of the models we interact with is still lagging behind the coolness and hype of the technologies using ML. This workshop seeks to address the gap between the speed at which we use and adopt ML and the pace at which we understand it. During the workshop, we will cover fundamental concepts and techniques of interpretable machine learning, and explore various explainability methods supported by the Alibi Explain library so that you can get started explaining your models. If you've been meaning to dive deeper into the field of interpretable ML, add interpretability to your workflows, find an alibi for your models, or are simply curious about the field, come and join us for a fun 90-minute interactive session on interpretable ML.
Maintaining on-premise clusters poses quite a few challenges. One of these challenges is achieving developer autonomy, where developers can deploy applications themselves. This talk will cover how we set up Kubernetes to achieve exactly that.
Join us as we explore the complexities of balancing the electricity grid amidst the rise of renewable energy sources. We’ll discover the challenges in forecasting electricity consumption from diverse industrial resources and the modelling techniques employed by Sympower to achieve accurate forecasts. Gain insights into the trade-offs involved in aggregating data at different hierarchical levels in time series forecasting.
How can probabilistic forecasting accelerate the renewable energy transition? The rapid growth of non-steerable and intermittent wind and solar power requires accurate forecasts and the ability to plan under uncertainty. In this talk, we will make a case for using probabilistic forecasts over deterministic forecasts. We will cover methods for generating and evaluating probabilistic forecasts, and discuss how probabilistic price and wind power forecasts can be combined to derive optimal short-term power trading strategies.
Wouldn’t it be great to have a Google-like search engine, but then for your own text files and completely private? In this tutorial we’ll build a small personal search engine using open source library llama-index.
Pickle files can be evil and simply loading them can run arbitrary code on your system. This talk presents why that is, how it can be exploited, and how skops
is tackling the issue for scikit-learn/statistical ML models. We go through some lower level pickle related machinery, and go in detail how the new format works.
The customers of Picnic use images and texts of products to decide if they like our products, so why not include those data streams in our Temporal Fusion Transformers that we use for Product Demand Forecasting?
Join us for a thrilling journey through convolutional, graph-based, and transformer-based architectures. Learn about methods to turn images, texts, and geographical information into features for other applications as we did for product demand forecasting. Discover how Picnic Technologies uses state-of-the-art multimodal approaches for demand forecasting to prevent food waste and keep our customers happy!
Data scientists and analysts are using quasi-experimental methods to make recommendations based on causality instead of randomized control trials. While these methods are easy to use, their assumptions can be complex to explain. This talk will explain these assumptions for data scientists and analysts without in-depth training of causal inference so they can use and explain these methods more confidently to change people's minds using data.
This hands-on tutorial introduces participants to knowledge graph modeling using Neo4j, a popular graph database. Suitable for beginners and those seeking to enhance their knowledge, the tutorial will help attendees to learn the fundamentals of knowledge graphs, gain insights into Neo4j's modeling capabilities, and acquire practical skills in designing effective knowledge graph models.
Recommendation systems shape personalized experiences across various sectors, but evaluating their effectiveness remains a significant challenge. Drawing on experiences from industry leaders such as Booking.com, this talk introduces a robust, practical approach to A/B testing for assessing the quality of recommendation systems. The talk is designed for data scientists, statisticians, and business professionals, offering real-world insights and industry tricks on setting up A/B tests, interpreting results, and circumventing common pitfalls. While basic familiarity with recommendation systems and A/B testing is beneficial, it's not a prerequisite.
This talk delves into the topic of minimizing the Data Mesh mess. We will explore practical strategies and a data platform architecture for effectively governing and managing data within a decentralized data setup. We can balance decentralization and maintaining data quality by imposing a few constraints. The takeaways of this talk are drawn from the data platform at Enza Zaden.
AI won't end the world, but it can and is making life miserable for plenty of folks. Instead of engaging with the AI overlords, let's explore a pragmatic set of design choices that all Data Scientists and ML devs can implement right now, to reduce the risks of deploying AI systems in the real world.
If you are curious about the field of cryptography and what it has to offer data science and machine learning, this talk is for you! We'll dive into the field of encrypted computation, where decryption isn't needed in order to perform calculations, transformations and operations on the data. You'll learn some of the core mathematical theory behind why and how this works, as well as the differences between approaches like homomorphic encryption and secure multi-party computation. At the end, you'll get some pointers and open-source library hints on where to go next and how to start using encrypted computation for problems you are solving the hard way (or not solving at all).
ChatGPT is a fantastic assistant, but it cannot do everything yet. For example, it cannot automatically manage my calendar, update my to-do list, or do anything that requires it to perform actions. However, what would it take to make this a reality? I decided to put it to the test by allowing ChatGPT to manage my to-do list for me.
During this presentation, I will tell how I gave ChatGPT access to my to-do list. Along the way, I will introduce you to the concepts behind LLM-based agents and how they work. Of course, I will also give a demo of the final result. After this demo, we will dive into clever engineering solutions and tricks I discovered to solve problems such as handling hallucinations, parsing actions, etc.
This talk is for people who want to learn how to build their first LLM-based agent. Familiarity with Python, PyDantic, and LMMs is nice during this presentation but not essential. As long as you love overengineered solutions to a basic to-do list, you will like this presentation.
Have you ever struggled with a multitude of columns created by One Hot Encoder? Or decided to look beyond it, but found it hard to decide which feature encoder would be a good replacement?
Good news, there are many encoding techniques that have been developed to address different types of categorical data. This talk will provide an overview on various encoding methods available in data science, and a guidance on decision making about which one is appropriate for the data at hand.
Join this talk if you would like to hear about the importance of feature encoding and why it is important to not default to One Hot Encoding in every scenario. It will start with commonly used approaches and will progress into more advanced and powerful techniques which can help extract meaningful information from the data.
For each presented encoder, after this talk you will know:
- When to use it
- When NOT to use it
- Important considerations specific to the encoder
- Python library that offers a built-in method with the encoder, facilitating easy integration into feature engineering pipelines.
How do we speed up a critical missing operation in pandas, the cumulative index max, and what does this tell us about the compromises and considerations we must bring to optimizing our code?
Back in 2018, a blogpost titled "Data's Inferno: 7 circles of data testing hell with Airflow" presented a layered approach to data quality checks in data applications and pipelines. Now, 5 years later, this talk looks back at Data's Inferno and surveys what has changed but also what hasn't in the space of ensuring high data quality.
Survival analysis was initially introduced to handle the data analysis required in use cases revolving death and treatment in health care. Due to its merit, this method has spread to many other domains for analyzing and modeling the data where the outcome is the time until an event of interest occurs. Domains such as finance, economy, sociology and engineering.
This talk aims at unraveling the potential of survival analysis with examples from different domains. A taxonomy of the existing descriptive and predictive analytics algorithms in survival analysis are demonstrated. The concept of some candidate algorithms from each group are explained in detail, along with an example and implementation guideline using the right open source framework.
Lorem ipsum dolor
Deepfakes, a form of synthetic media where a person's image or video is seamlessly replaced using Generative AI like GANs, have recieved significant attention. This talk aims to provide a comprehensive exploration of deepfakes, covering their creation process, positive and negative effects, development pace, and tools for detection. By the end of the presentation, attendees will be equipped with how to create and detect deepfakes, a deep understanding of the technology and its impact.
Feature Stores are a vital part of the MLOps stack for managing machine learning features and ensuring data consistency. This talk introduces Feature Stores and the underlying data management architecture. We’ll then discuss the challenges and learnings of integrating DuckDB and Arrow Flight into the our Feature Store platform, and share benchmarks showing up to 30x speedups compared to Spark/Hive. Discover how DuckDB and ArrowFlight can also speedup your data management and machine learning pipelines.
Data science, IT and software development become more and more complex and are subject to increasing requirements and fast-paced business demand. Higher complexity, higher pace and higher quality requirements result in more pressure on our fellow data engineers and data scientists.
More pressure, but are we resilient enough to withstand that increasing pressure? You have probably already seen its outcome. Unhappiness, stress or even burn-outs of co-workers, instead of creating cool code, great solutions and building a better world using your skills.
How to change the pressure and stress you perceive as a data scientist, data engineer of ML-engineer? How to ensure that your brain’s frontal lobe returns to a problem solving and decision-making state?
Keynote by Thomas Wolf. He will be accompanied on stage by Alessandro Cappelli, Julien Launay & Guilherme Penedo, all members of the Hugging Face team in Amsterdam working on large model training.
In this talk I will try to show you what might happen if you allow yourself the creative freedom to rethink and reinvent common practices once in a while. As it turns out, in order to do that, natural intelligence is all you need. And we may start needing a lot of it in the near future
This talk explores distillation learning, a powerful technique for compressing and transferring knowledge from larger neural networks to smaller, more efficient ones. It delves into its core components and various applications such as model compression and transfer learning. The speaker aims to simplify the topic for all audiences and provides implementation, demonstrating how to apply distillation learning in real scenarios. Attendees will gain insights into developing efficient neural networks by reviewing the various examples of the complex model. The material will be accessible online for convenient access and understanding.
Do you test your data pipeline? Do you use Hypothesis? In this workshop, we will use Hypothesis - a property-based testing framework to generate Pandas DataFrame for your tests, without involving any real data.
Exploration of the intersection between data, AI, and environmental conservation. In this talk, we will share our experiences and practical insights during our journey trying to develop a system using Python, camera traps and data-driven techniques to help detect poachers in Africa.
Did you know that you could do transfer learning on boosted forests too? Even in current days, we face business cases where the modelling sample is very low. This brings an uncertainty to the modelling results and in some cases no ability to model at all. To counter it, we investigated the ability to use transfer learning approaches on boosting models. In this talk, we would like to show the methods used and results from a real case example applied to the credit risk domain.
This informative talk aims to close the gap between the theory of data contracts and their real-life implementations. It contains a few Python code snippets and is aimed primarily at data and software engineers. However, it could be food for thought for machine learning engineers, data scientists, and other data consumers.
Optimizing machine learning models using regular metrics is a common practice in the industry. However, aligning model optimization with business metrics is closely tied to the objectives of the business and is highly valued by product managers and other stakeholders. This talk delves into the process of training machine learning models based on business metrics in order to enhance economic outcomes. With a primary focus on data scientists and machine learning practitioners, this talk explores techniques, methodologies, and real-world applications that harness the power of business metrics to propel machine learning models and foster business success. We will present a specific case study that demonstrates how we utilized business metrics at Booking.com that brought significant impact on model performance on business outcomes. Specifically, we will discuss our approaches to leveraging business metrics for hyperparameter tuning and reducing model complexity, which instill greater confidence within our team when deploying improved models to production.
In the Netherlands a large share of energy is used by industry. By measuring the energy usage of individual machines in real time it is possible to pinpoint when machines are operating inefficiently and help factories take measures to reduce energy waste. It turns out that in most factories, the biggest source of energy waste comes from idling machines. To be able to give valuable insights and provide relevant alerts to our customers, we set up a machine learning system for standby detection with a “human in the loop”. In this talk we will go over the considerations that go into setting up a machine learning system with a human in the loop and showcase our approach to the problem. No background knowledge is required for this talk.
Some say machine learning projects fail because they live in notebooks.
But I would bet that even more of them fail because their projects solve a problem that doesn’t exist. Or uses an interface that’s not feasible. In other words, they fail because they don’t validate their underlying assumptions.
Product analytics helps build models that solve real problems. In my time at ING, I’ve been dealing with a lot of the latter, and I’ll be sharing my thoughts on how to find problems worth solving with data science.
Many algorithms for machine learning from time series are based on measuring the distance or similarity between series. The most popular distance measure is dynamic time warping, which attempts to optimally realign two series to compensate for offest. There are many others though. We present an overview of the most popular time series specific distance functions and describe their speed optimised implementations in aeon, a scikit-learn compatible time series machine learning toolkit. We demonstrate their application for clustering, classification and regression on a real world case study and highlight some of the latest distance based time series machine learning tools available in aeon.
This talk we will see why the expression engine in polars is so versatile and fast.
We will look at them in the perspective of the optimizer as well as the physical engine.
Are you 100% sure that you can trust your labels?
Imagine spending a company credit card worth of compute on getting the best model statistics ever. Would that be money well spent if your dataset has some labeling issues?
More often than not, "bad labels" are great because they can tell you how to improve the machine learning model before even training it. But it only works only if you actually spend the time being confronted with your own dataset. In this workshop, we'll annotate our own data while we leverage techniques to find happy accidents. To solve specific problems, you don't need loads of data anymore – you just need good data.
In this talk, we will explore the Bayesian Bradley Terry model implemented in PyMC. We will focus on its application for ranking tennis players, demonstrating how this probabilistic approach can provide an accurate and robust rankings, arguably better than the ATP ranking itself and the Elo rating system.
By leveraging the power of Bayesian statistics, we can incorporate prior knowledge, handle uncertainty, and make better inferences about player abilities. Join us to learn how to implement the Bayesian Bradley Terry model in PyMC and discover its advantages for ranking tennis players.
In this talk, we would like to introduce you to the urban challenges that the City of Amsterdam is trying to solve using AI. We will walk you through the technical details behind one of our projects and invite you to join us in the ethical development of cool AI applications for social good.