Student Projects

Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival

Project for Mathematical Structures of Complex Systems, Heidelberg University, WS 2019/2020

by Sebastian Dobrzynski, André Schulze



Topological Data Analysis (TDA) is a fairly new topic using results from algebraic topology to study the struc- ture of point cloud data sets equipped with a notion of distance. One of the biggest advantages of topological approaches in data science is that they are very stable under small perturbations. The most prominent exam- ple of the TDA-toolkit is persistent homology. There, one can construct a reasonable filtration of simplicial complexes, where the data points are the edges, and then use homology theory to find clusters or to classify the points. See Kraft [1] for further details.

The tool we use in this work is called the mapper algorithm. The key idea is to identify local clusters within the data and then to understand the interaction between these small clusters by connecting them to form a graph, whose shape captures aspects of the topology of the data set. The theoretical foundation of this algorithm is set by the Nerve Theorem from algebraic topology.


Figure: Simplicial complex for the GSE45827 data set.