Reliable and Scalable ML Serving: Best Practices for Online Model Deployment PyData Amsterdam 2023

Reliable and Scalable ML Serving: Best Practices for Online Model Deployment
.ical

09-14, 12:00–12:30 (Europe/Amsterdam), Bar

Working on ML serving for couple of years we learned a lot. I would like to share a set of best practices / learnings with the community

At Adyen we deploy a lot of models for online inference in the payment flow. Working in the MLOps team to streamline this process, I learned a lot about best practices / things to consider before (after) putting a model online. These are small things but they do contribute to a production and reliable setup for online inference. Some examples:

Adding meta data & creating a self contained archive
Separating serving sources from training sources
Choosing the requirements of model
Adding an example input & output request
Adding schemas for input and output
Common issues when putting models online: memory leaks, concurrency
Which server is best? Process based or thread based
How different python versions affect inference (execution) time

Prior Knowledge Expected –

No previous knowledge expected

Ziad Al Moubayed

Staff Engineer @ Adyen. I am passionate about high performance distributed systems. Recently I was working on scaling Adyen's Data & ML infrastructure.

Reliable and Scalable ML Serving: Best Practices for Online Model Deployment .ical 09-14, 12:00–12:30 (Europe/Amsterdam), Bar

Reliable and Scalable ML Serving: Best Practices for Online Model Deployment
.ical

09-14, 12:00–12:30 (Europe/Amsterdam), Bar