A follow-up comment after having studied the paper a bit more, since you asked a...

A follow-up comment after having studied the paper a bit more, since you asked about where the geometry comes into play.

One of the references the paper provide is to this[1] paper, which shows how the non-linear layers in modern deep neural networks partitions the input into regions and applies region-dependent affine mappings[2] to generate the output. It also mentions how that connects to vector quantization and k-means clustering.

So, the geometric perspective isn't referring to your typical high-school geometry, but more abstract concepts like vector spaces[3] and combinatiorial computational geometry[4].

The submitted paper shows that this partitioning is directly linked to the approximation power of the neural network. They then show how increasing the approximation power results in better answers to math word problems, and hence that the approximation power correlated to the reasoning ability of LLMs.

[1]: https://arxiv.org/abs/1805.06576v2

[2]: https://en.wikipedia.org/wiki/Affine_transformation

[3]: https://en.wikipedia.org/wiki/Vector_space

[4]: https://en.wikipedia.org/wiki/Computational_geometry#Combina...