“Neural Networks, Manifolds, and Topology”, Chris Olah2014-04-06 (, , , ; backlinks; similar)⁠:

[Discussion of geometric interpretations of neural networks: each layer in a NN continuously ‘squashes’ or ‘squeezes’ points (data), gradually associating like with like, and creating new abstractions/representations. By stacking many of these, a NN can approximate extremely complex nonlinear functions which solve the problem. Olah provides animations to visualize how the datapoints in standard toy problems like the ‘Swiss roll’ example are stretched and warped until they can be solved easily by a simple linear function. This helps us understand what a NN does, and can provide some simple limiting results on what a NN of a given size/depth can or cannot do.]

…it can be quite challenging to understand what a neural network is really doing. If one trains it well, it achieves high quality results, but it is challenging to understand how it is doing so. If the network fails, it is hard to understand what went wrong. While it is challenging to understand the behavior of deep neural networks in general, it turns out to be much easier to explore low-dimensional deep neural networks—networks that only have a few neurons in each layer. In fact, we can create visualizations to completely understand the behavior and training of such networks. This perspective will allow us to gain deeper intuition about the behavior of neural networks and observe a connection linking neural networks to an area of mathematics called topology.

…Topological properties of data, such as links, may make it impossible to linearly separate classes using low-dimensional networks, regardless of depth. Even in cases where it is technically possible, such as spirals, it can be very challenging to do so. To accurately classify data with neural networks, wide layers are sometimes necessary. Further, traditional neural network layers do not seem to be very good at representing important manipulations of manifolds; even if we were to cleverly set weights by hand, it would be challenging to compactly represent the transformations we want. New layers, specifically motivated by the manifold perspective of machine learning, may be useful supplements.

The Manifold Hypothesis: Is this relevant to real world data sets, like image data? If you take the manifold hypothesis really seriously, I think it bears consideration. The manifold hypothesis is that natural data forms lower-dimensional manifolds in its embedding space. There are both theoretical3 and experimental4 reasons to believe this to be true. If you believe this, then the task of a classification algorithm is fundamentally to separate a bunch of tangled manifolds.