> think the valuable idea is probabilistic graphical models- of which transforme...

cauliflower2718 · 2025-10-24T19:41:44 1761334904

+1, I am also big user of PGMs, and also a big user of transformers, and I don't know what the parent comment talking about, beyond that for e.g. LLMs, sampling the next token can be thought of as sampling from a conditional distribution (of the next token, given previous tokens). However, this connection of using transformers to sample from conditional distributions is about autoregressive generation and training using next-token prediction loss, not about the transformer architecture itself, which mostly seems to be good because it is expressive and scalable (i.e. can be hardware-optimized).

Source: I am a PhD student, this is kinda my wheelhouse

nickpsecurity · 2025-10-25T13:04:19 1761397459

Don't give up on older stuff just because deep learning went in a different direction. It's a perfect time to recombine the new with the old. I started DuckDuckGoing and found combinations of ("deep learning" or "neural networks") with ("gaussian," "clustering," "support vector machines," "markov," "probabilistic graphical models").

I haven't actually read these to see if they achieved anything. I'm just sharing the results from a quick search in your sub-field in case it helps you PGM folks.

https://arxiv.org/abs/2104.12053

https://pmc.ncbi.nlm.nih.gov/articles/PMC7831091/

And here's an intro for those wondering what PGM is:

https://arxiv.org/abs/2507.17116