I think you've hit the nail on the head there. If these systems of reasoning are truly general then they should be able to perform consistently in the same way a human does across similar tasks, baring some variance.
I'm still not convinced that it's not going through approximate reasoning chain retrieval and that's self-triggered to get more reasoning chains that will maximize it's goal. I'm seeing a lot of comments from other SWEs using it for non-trivial tasks in which it fails at but is just trying harder to look like it's problem solving. Even with more context and documentation, it fails to realize details an experienced SWE would pick up quickly.
I think the fact that it was caught is a testament to the power of open source and the quality of engineering. It speaks volumes about the promise of OSS.
> I think the fact that it was caught is a testament to the power of open source and the quality of engineering.
Ha, no. It was caught because an engineer noticed his SSH was taking slightly longer than normal, a few hundred milliseconds of difference. There was nothing in the code to suggest an anomaly; so he began fully reverse-engineering the binary as though it was proprietary software. The open-source communities meanwhile had approved and were even distributing that code on testing branches. And to top it all off, that engineer worked for Microsoft; so it's pretty ironic to complain about Microsoft's behavior with Windows, considering they just saved Linux from catastrophe.
wait, did the open source community have access to Microsoft's code and then fail to find the Cloudstrike vulnerability even though they had the same amount of time with that code that MS had with theirs?
I mean open source idea is that even MS engineer can find their problems.
And then MS engineer found their problems.
So yeah, does seem like a win for open source ideas which is not even something I particular care about...
> wait, did the open source community have access to Microsoft's code and then fail to find the Cloudstrike vulnerability even though they had the same amount of time with that code that MS had with theirs?
Cloudstrike was not in Microsoft's code; so yell at Cloudstrike.
> And then MS engineer found their problems.
The Microsoft engineer found the malicious code from the binary first - the same way he would have investigated proprietary software. The fact that it was open source didn't help discover the vulnerability in any way. The open source nature only helped with explaining how it got in there afterwards.
> So yeah, does seem like a win for open source ideas which is not even something I particular care about...
According to many security researchers, the `xz-utils` thing was deeply underrated and shows how, let's just say, not necessarily more secure open source software is. The fact that every Linux computer was nearly backdoored, globally, and it was found by accident, by a Microsoft engineer nonetheless, without any use of the code to find the bug, after that code was approved by both Debian and Red Hat, looks terrible to the open source community.
I take that back. It doesn't just look bad - it is bad. The ideology and security assumptions are in question, bad. If Microsoft's engineer had not found that bug, Linux and open-source as a whole could have sustained a mortal wound and a crushing blow to the theory of open-source being more secure.