Computer vision
13/11/2020

Topological data analysis with Mapper


Auteur : Milton Minervino
Temps de lecture : 8 minutes
Quantmetry.com : Topological data analysis with Mapper

Introduction

Topological data analysis (TDA) is a recent branch of data analysis that uses topology and in particular persistent homology. TDA is used to extract meaning from the shape of data.

In this article we will explore a particular technique of TDA called the Mapper algorithm introduced by Singh, Memoli and Carlsson in their seminal paper Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition (2007).

Mapper is a combination of dimensionality reduction, clustering and graph networks techniques used to get higher level understanding of the structure of data. It is mainly used for: 

  • visualising the shape of data through a particular lens
  • detecting clusters and interesting topological structures which traditional methods fail to find 
  • selecting features that best discriminate data and for model interpretability

A flourishing literature shows its high potential in real data applications and especially in biology and medicine. Created by some founders of the celebrated silicon valley start-up Ayasdi, Mapper is applied in business to customer segmentation, hot spot analysis, fraud detection, churn prediction, drug discovery, socio-economic, sensor or text data as well.

Performance profiles for each of the 452 players in the NBA by using data from the 2010–2011 NBA season. The number of the covering intervals on the left-hand figure is 20 while on the right-hand is 30 (higher resolution). Graphs are colored by points per game and the lens is chosen to be the projection to the principal and secondary SVD values. For more details, see Extracting insights from the shape of complex data using topology. Precise definitions of coverings and lenses are described in Section “The algorithm”.

 

 

 

In this article we will briefly describe what Mapper technically is, reveal the advantages that it carries and show some applications as well as an exploratory case study. In a forthcoming article we will explore how TDA and persistent homology can be used effectively with Machine Learning.

The Algorithm

Given a dataset of points, the basic steps behind Mapper are as follows:

  1. Map to a lower-dimensional space using a filter function f, or lens. Common choices for the filter function include projection onto one or more axes via PCA or density-based methods. 
  2. Construct a cover (U_i)_{i\in I} of the projected space typically in the form of a set of overlapping intervals which have constant length.
  3. For each interval U_i  cluster the points in the preimage f^{-1}(U_i)  into sets C_{i,1},\ldots,C_{i,k_i}.
  4. Construct the graph whose vertices are the cluster sets and an edge exists between two vertices if two clusters share some points in common.

Illustration of the mapper algorithm (source): data is represented