PyData Amsterdam 2023

Reliable and Scalable ML Serving: Best Practices for Online Model Deployment
09-14, 12:00–12:30 (Europe/Amsterdam), Bar

Working on ML serving for couple of years we learned a lot. I would like to share a set of best practices / learnings with the community

At Adyen we deploy a lot of models for online inference in the payment flow. Working in the MLOps team to streamline this process, I learned a lot about best practices / things to consider before (after) putting a model online. These are small things but they do contribute to a production and reliable setup for online inference. Some examples:

  • Adding meta data & creating a self contained archive
  • Separating serving sources from training sources
  • Choosing the requirements of model
  • Adding an example input & output request
  • Adding schemas for input and output
  • Common issues when putting models online: memory leaks, concurrency
  • Which server is best? Process based or thread based
  • How different python versions affect inference (execution) time

Prior Knowledge Expected

No previous knowledge expected

Staff Engineer @ Adyen. I am passionate about high performance distributed systems. Recently I was working on scaling Adyen's Data & ML infrastructure.