Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you for sharing this. I've added it to my reading list!


The last figure in this paper is a huge disappointment when you consider how reality meets such theoretical arguments. A typical mamba model would have 100 layers (maybe 60 for snaller ones). That figure scales up to 4 layers and these are sufficient for the problem they consider so the argument goes that another RNN only needed one layer for it.


Why bother with more layers? The inability of the conventional transformer to not have recurrent layers is mostly a weakness. The final layers of an LLM do very little (except the last) and most of the time just let the token pass through from the layer that actually determined it. A recurrent architecture could dynamically perform as many passes as it needs to produce the next token. This would result in a speedup for easy tokens and a slowdown for hard tokens compared to the fixed "you must go through all layers regardless" architecture of classical LLMs.


Mamba is technically a recurrent neural network (and typically decodes as such), simply of a constrained architecture that among other things keeps the norms of its matrixes finite independently of gating. I think that the above paper confused the word “state” to make some cheap points that may have hit the social networks at the time, yet it didnt demonstrate a practical benefit of earlier RNN over mamba, and the example looks a bit silly. If one had used resnet in vision with less than 4 layers to make a point, then that would be the equivalent of this paper. It might have been stronger, if the authors cared more, but we will not find out.


(disclaimer: I've not looked at mamba specifically yet) but state spaces as used current are different from traditional RNNs in a very simple way: the state is linearly (often associatively) accumulated while in RNN the state is passed/accumulated as an input/output of a non-linear function.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: