Hacker Newsnew | past | comments | ask | show | jobs | submit | tsurba's commentslogin

Many discriminative models converge to same representation space up to a linear transformation. Makes sense that a linear transformation (like PCA) would be able to undo that transformation.

https://arxiv.org/abs/2007.00810

Without properly reading the linked article, if thats all this is, not a particularly new result. Nevertheless this direction of proofs is imo at the core of understanding neural nets.


It's about weights/parameters, not representations.


True, good point, maybe not a straightforward consequence to extend to weights.


Edit: actually this paper is the canonical reference (?): https://arxiv.org/abs/2007.00810 models converge to same space up to a linear transformation. Makes sense that a linear transformation (like PCA) would be able to undo that transformation.

You can show for example that siamese encoders for time-series, with MSE loss on similarity, without a decoder, will converge to the the same latent space up to orthogonal transformations (as MSE is kinda like gaussian prior which doesn’t distinguish between different rotations).

Similarly I would expect that transformers trained on the same loss function for predicting the next word, if the data is at all similar (like human language), would converge to approx the same space, up to some, likely linear, transformations. And to represent that same space probably weights are similar, too. Weights in general seem to occupy low-dimensional spaces.

All in all, I don’t think this is that surprising, and I think the theoretical angle should be (have been?) to find mathematical proofs like this paper https://openreview.net/forum?id=ONfWFluZBI

They also have a previous paper (”CEBRA”) published in Nature with similar results.


You can show for example that siamese encoders for time-series, with MSE loss on similarity, without a decoder, will converge to the the same latent space up to orthogonal transformations (as MSE is kinda like gaussian prior which doesn’t distinguish between different rotations).

Similarly I would expect that transformers trained on the same loss function for predicting the next word, if the data is at all similar (like human language), would converge to approx the same space. And to represent that same space probably weights are similar, too. Weights in general seem to occupy low-dimensional spaces.

All in all, I don’t think this is that surprising, and I think the theoretical angle should be (have been?) to find mathematical proofs like this paper https://openreview.net/forum?id=ONfWFluZBI


But are we close to doing that in real-time on any reasonably large model? I don’t think so.


This is not about reasoning , this is about continuous learning and perpetual learning .

https://github.com/dmf-archive/PILF

https://dmf-archive.github.io/docs/posts/beyond-snn-plausibl...


I agree with everything up until the AI part, and for that part too, the general idea is good and worth worrying about. I’m scared af about what happens to kids who do all their homework with LLMs. Thankfully at least we still have free and open models, and are not just centralizing everything.

But chatgpt does help me work through some really difficult mathematical equations in newest research papers by adding intermediate steps. I can easily confirm when it gets them right and when not, as I do have some idea. It’s super useful.

If you are not able to make LLMs work for you at all, and complain about them on the internet, you are an old man yelling at clouds. The blog post devolves from an insightful viewpoint into a long sad ramble.

It’s 100% fine if you don’t want to use them yourself, but complaining to others gets tired quick.


Thankfully in the EU you are not even allowed to sell sunglasses without proper UV protection, and can just pick up sunglasses from any market and trust they are fine, if a little flimsy.

EDIT: ok apparently anywhere else than the poorest of countries, too, really.


And how long have you been doing this? Because that sounds naive.

After doing programming for a decade or two, the actual act of programming is not enough to be ”creative problem solving”, it’s the domain and set of problems you get to apply it to that need to be interesting.

>90% of programming tasks at a company are usually reimplementing things and algorithms that have been done a thousand times before by others, and you’ve done something similar a dozen times. Nothing interesting there. That is exactly what should and can now be automated (to some extent).

In fact solving problems creatively to keep yourself interested, when the problem itself is boring is how you get code that sucks to maintain for the next guy. You should usually be doing the most clear and boring implementation possible. Which is not what ”I love coding” -people usually do (I’m definitely guilty).

To be honest this is why I went back to get a PhD, ”just coding” stuff got boring after a few years of doing it for a living. Now it feels like I’m just doing hobby projects again, because I work exactly on what I think could be interesting for others.


I think you make a good point. There is an issue of people talking over each other. The reality is, we don't all do the same work. It's possible my job and someone else's involves having to deliver very different code where the challenges to it differ.

One person might feel like their job is just coding the same CRUD app over and over re-skinned. Where-as I feel my job is to simplify code by figuring out better structures and abstractions to model the problem domain which together solve systemic issues with the delivered system and enables more features to work together without issue and be added to the system, as well as making changes and new features/use-cases delivery faster.

The latter I find a creative exercise, the former I might get bored and wish AI could automate it away.

I think what it is you are tasked with doing exactly at your job will also mean that your use of agentic AI actually makes you more productive or not.


Gambling is where I end up if I’m tired and try to get an LLM to build my hobby project for me from scratch in one go, not really bothering to read the code properly. It’s stupid and a waste of time. Sometimes it’s easier to get started this way though.

But more seriously, in the ideal case refining a prompt based on a misunderstanding of an LLM due to ambiguity in your task description is actually doing the meaningful part of the work in software development. It is exactly about defining the edge cases, and converting into language what is it that you need for a task. Iterating on that is not gambling.

But of course if you are not doing that, but just trying to get a ”smarter” LLM with (hopefully deprecated study of) ”prompt engineering” tricks, then that is about building yourself a skill that can become useless tomorrow.


Is it a puzzle if there is no algorithm?

But testing via coding algos to known puzzles is problematic as the code may be in the training set. Hence you need new puzzles, which is kinda what ARC was meant to do, right? Too bad OpenAI lost credibility for that set by having access to it, but ”verbally promising” (lol) not to train on it, etc.


I would argue the other way around. I have ADHD, but the thing that really helped me with work procrastination, which I think would help even without ADHD, was to find a job that is actually interesting.

In approx 7 years I went through working at all the top software companies in my country, but what really fixed my problems was moving on to being a researcher at the university. I’m now paid less than half from before, but it’s still enough, and I couldn’t be happier.

Getting to work on what I think is actually important and interesting every day is what helped. I also seem happier than the younger researchers who didn’t work at companies first, who don’t know how good they have it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: