vectorhacker's comments

vectorhacker · 2025-06-17T03:51:41 1750132301

So it turns out the comment paper was a joke: https://lawsen.substack.com/p/when-your-joke-paper-goes-vira...

vectorhacker · 2025-06-17T03:49:15 1750132155

Makes me wonder if LLMs get tired. /s

vectorhacker · 2025-02-01T03:58:45 1738382325

Yeah, I no longer consider the SWE-bench useful because these models can just "memorize" the solutions to the PRs.

vectorhacker · on Jan 11, 2025

I'd also be interested to know what happened in the last 2-4 months.

vectorhacker · on Dec 22, 2024

I think you've hit the nail on the head there. If these systems of reasoning are truly general then they should be able to perform consistently in the same way a human does across similar tasks, baring some variance.

vectorhacker · on Sept 30, 2024

Stanford was setup by people who came from that tradition.

vectorhacker · on Sept 13, 2024

I'm still not convinced that it's not going through approximate reasoning chain retrieval and that's self-triggered to get more reasoning chains that will maximize it's goal. I'm seeing a lot of comments from other SWEs using it for non-trivial tasks in which it fails at but is just trying harder to look like it's problem solving. Even with more context and documentation, it fails to realize details an experienced SWE would pick up quickly.

vectorhacker · on July 23, 2024

I think the fact that it was caught is a testament to the power of open source and the quality of engineering. It speaks volumes about the promise of OSS.

gjsman-1000 · on July 23, 2024

> I think the fact that it was caught is a testament to the power of open source and the quality of engineering.

Ha, no. It was caught because an engineer noticed his SSH was taking slightly longer than normal, a few hundred milliseconds of difference. There was nothing in the code to suggest an anomaly; so he began fully reverse-engineering the binary as though it was proprietary software. The open-source communities meanwhile had approved and were even distributing that code on testing branches. And to top it all off, that engineer worked for Microsoft; so it's pretty ironic to complain about Microsoft's behavior with Windows, considering they just saved Linux from catastrophe.

bryanrasmussen · on July 23, 2024

wait, did the open source community have access to Microsoft's code and then fail to find the Cloudstrike vulnerability even though they had the same amount of time with that code that MS had with theirs?

I mean open source idea is that even MS engineer can find their problems.

And then MS engineer found their problems.

So yeah, does seem like a win for open source ideas which is not even something I particular care about...

Tabular-Iceberg · on July 23, 2024

Do you really need to read MS source code to figure out that it has a kernel module loading mechanism?

gjsman-1000 · on July 23, 2024

> wait, did the open source community have access to Microsoft's code and then fail to find the Cloudstrike vulnerability even though they had the same amount of time with that code that MS had with theirs?

Cloudstrike was not in Microsoft's code; so yell at Cloudstrike.

> And then MS engineer found their problems.

The Microsoft engineer found the malicious code from the binary first - the same way he would have investigated proprietary software. The fact that it was open source didn't help discover the vulnerability in any way. The open source nature only helped with explaining how it got in there afterwards.

> So yeah, does seem like a win for open source ideas which is not even something I particular care about...

According to many security researchers, the `xz-utils` thing was deeply underrated and shows how, let's just say, not necessarily more secure open source software is. The fact that every Linux computer was nearly backdoored, globally, and it was found by accident, by a Microsoft engineer nonetheless, without any use of the code to find the bug, after that code was approved by both Debian and Red Hat, looks terrible to the open source community.

I take that back. It doesn't just look bad - it is bad. The ideology and security assumptions are in question, bad. If Microsoft's engineer had not found that bug, Linux and open-source as a whole could have sustained a mortal wound and a crushing blow to the theory of open-source being more secure.

CRConrad · on July 29, 2024

> Cloudstrike was not in Microsoft's code;

Yes of course it was; it ran in kernel space. Microsoft let it in, ao it was in there.

> so yell at Cloudstrike

Sure. And Microsoft.

mkul · on July 23, 2024

tbf, it was an engineer at Microsoft that found it lol

vectorhacker · on July 18, 2024

"Good, EU overreach is getting out of hand" my boss said when he saw this article.