PyData Amsterdam 2023

Lets do the time warp again: time series machine learning with distance functions
09-16, 11:30–12:00 (Europe/Amsterdam), Qux

Many algorithms for machine learning from time series are based on measuring the distance or similarity between series. The most popular distance measure is dynamic time warping, which attempts to optimally realign two series to compensate for offest. There are many others though. We present an overview of the most popular time series specific distance functions and describe their speed optimised implementations in aeon, a scikit-learn compatible time series machine learning toolkit. We demonstrate their application for clustering, classification and regression on a real world case study and highlight some of the latest distance based time series machine learning tools available in aeon.


This talk introduces you to popular time series distance functions and demonstrates their usage in exploratory and predictive modelling of time series. Participants will come away with an idea of how to use the very latest research into time series distances for clustering, classification and regression using the aeon toolkit and scikit learn. The talk will be mostly practical and code based, with some algorithmic and mathematical notation.

Distances are used in all forms of time series machine learning. They can help explore collections of time series through clustering, reduce dimensionality by averaging and be used with instance based or kernel based classifiers and regressors. They are used in streaming based anomaly detection and change point detection and have been embedded within tree based ensembles for classification.

The basic problem in specifying a distance function is to quantify how dissimilar two series are. Elastic distances attempt to compensate for small mis-alignments caused by offset that would make similar series look very different to measures such as Euclidean distance or correlation. There have been many different algorithms that combine forms of time warping (stretching the indexes to realign series) and editing (removing time points from one of the series to improve alignment). In the first part of the talk we will provide a high level overview and visualisation of the differences between these algorithms before describing the aeon toolkit, which contains the most comprehensive and fastest library of elastic distances that we are aware of. aeon distances can be used directly with sklearn distance based algorithms and with the many time series specific algorithms for classification, clustering and regression available in aeon. In the the middle section of the tutorial we will use a real world industrial dataset to demonstrate use cases in clustering, classification and regression. We will end with some pointers to the very latest research into using distance functions. We will require attendees to have a basic knowledge of scikit-learn and standard machine learning algorithms.

This should appeal to anyone interested in machine learning from time series. It will focus on practical application and algorithm comprehension rather than maths, and will identify the very latest research into algorithm development to suggest further reading. We will provide easy to follow notbooks prior to the talk and all examples will be freely available.


Prior Knowledge Expected

No previous knowledge expected

Tony is a Professor of Computer Science at the University of East Anglia, where he leads the time series machine learning group. His primary research interest is in time series machine learning, with a historic focus on classification, but more recently looking at clustering and regression. He has a side interest in ensemble design.