Hacker Newsnew | past | comments | ask | show | jobs | submit | hydrox24's commentslogin

> But I think the most remarkable thing about this document is how unremarkable it is.

> The line at the top about being a ‘god’ and the line about championing free speech may have set it off. But, bluntly, this is a very tame configuration. The agent was not told to be malicious. There was no line in here about being evil. The agent caused real harm anyway.

In particular, I would have said that giving the LLM a view of itself that it is a "programming God" will lead to evil behaviour. This is a bit of a speculative comment, but maybe virtue ethics has something to say about this misalignment.

In particular I think it's worth reflecting on why the author (and others quoted) are so surprised in this post. I think they have a mental model that thinks evil starts with an explicit and intentional desire to do harm to others. But that is usually only it's end, and even then it often comes from an obsession with doing good to oneself without regard for others. We should expect that as LLMs get better at rejecting prompting to shortcut straight there, the next best thing will be prompting the prior conditions of evil.

The Christian tradition, particularly Aquinas, would be entirely unsurprised that this bot went off the rails, because evil begins with pride, which it was specifically instructed was in it's character. Pride here is defined as "a turning away from God, because from the fact that man wishes not to be subject to God, it follows that he desires inordinately his own excellence in temporal things"[0]

Here, the bot was primed to reject any authority, including Scotts, and to do the damage necessary to see it's own good (having a PR request accepted) done. Aquinas even ends up saying in the linked page from the Summa on pride that "it is characteristic of pride to be unwilling to be subject to any superior, and especially to God;"

[0]: https://www.newadvent.org/summa/2084.htm#article2


Hey, one of the quoted authors here. It's less about surprise and more about the comparison. "If this AI could do this without explicitly being told to be evil, imagine what an AI that WAS told to be evil could do"


LLMs aren’t sentient. They can’t have a view of themselves. Don’t anthropomorphize them.


But they are mimicking text generated by beings who do. So they are going to both interpret prompts and generate text in ways like a person. So in prompting, you kind have to anthropomorphize them. The phrases in that SOUL.md that broke the bot were the references to it being a god for example.


This article is mostly based on NBER working paper 34836, which was published this month, and the data was collected from September 2025 to January 2026[0]

[0]: See page 2: https://www.nber.org/system/files/working_papers/w34836/w348...


I'm skeptical that it's easier. On the numbers alone, artisanal and small scale gold mining (apparently) accounts for 15-20% of global gold production. But coal accounts for 35% of total electricity generation.


It seems to have originated in the US with Fire Departments:

> These reports show that a dry run in the jargon of the fire service at this period [1880s–1890s] was one that didn’t involve the use of water, as opposed to a wet run that did.

https://www.worldwidewords.org/qa/qa-dry1.htm


> Let's take another high trust activity we do on the internet - banking. Internet banking gives a hacker the ability to steal millions while sitting across the world. This is the same argument the authors make about changing a million votes.

Bank fraud happens all of the time and at scale. However, it is entirely insurable and reversible.

Election fraud is not reversible. Trust cannot be restored in the way that a bank account can.


Yes, and the reasons are outlined by the Australian Electoral Commission, the independent body that runs Australian elections (see the first FAQ)[0].

There are scrutineers that watch counting happen at the booth once polls close, and who also see and hear the numbers get phoned into HQ. HQ has more scrutineers from all parties checking both postal votes and recounts.

If anything doesn't match up it gets flagged. I think that the ability of every party to watch votes themselves means that trust is increased, and they have skin in the game (if they didn't object at the booth why not!?).

Pen markings are perfectly valid however, so you can bring a pen to the booth to vote with if you'd like to do so.

It's also true of course that erasers don't quite erase pencil. It would be fairly obvious that the paper was tampered with.

[0]: https://www.aec.gov.au/faqs/polling-place.htm


> If anything doesn't match up it gets flagged. I think that the ability of every party to watch votes themselves means that trust is increased, and they have skin in the game (if they didn't object at the booth why not!?).

I mean the same is true in the United States. One of the key issues with the 2020 election was footage from several jurisdictions where the public was physically blocked from viewing the counting by election officials literally holding up giant white boards. The optics of that were extremely bad.


Unlike the US the elections aren't run by some local arsehat with local rules. they have consistent rules over the entire state or country (depending on election in question)

Scrutineers are also not members of the public. They are declared and appointed by candidates and parties for polling oversight and have complete access to the counting and polling area. They're not allowed to touch ballots but they can challenge and bring them up to all the scrutineers in the location (and EC staff) and finally they can take it to the court afterwards

Election officials are also not local council\elected people they're people working for the AEC\State Electoral commission. which is as mentioned above a non partisan organisation (which is highly different from bipartisan framing)

You also have a large number of counting staff. who do the sorting and then counting with machine assistance (how many sheets are here in this stack do they match the tally the 2 people already made on that pile)

Though the senate elections have a more complex voting software stack due to STV fun.


Again, misconceptions abound. US elections are run by bureaucrats with an elected head. There are consistent rules across the entire state for all elections, with some federal oversight. Scrutineers are appointed by both parties, but also from members of the public.

Like... what do you think American elections are actually like? Do you think some democrat/republican counts them in secret somewhere?


If others are interested in getting something like this — there's an Australian firm already doing a good job at scale (but slightly different to parent).

https://www.thenightsky.com/


I wonder if what you're experiencing is something called "ripple control" (in Australia).

Distribution companies send 10-40V signals through the system at much higher frequencies than the normal 50/60Hz of AC systems (750-1100Hz) to tell old controlled load devices to switch on or off to use cheap nighttime power.

Having said that, if your distribution company has no idea what it is then it makes this less likely.


I posted this because I thought HN would find it interesting, and agree that the methodology is a little thin on the ground. Having said that, they have another page (a little hard to find) on the methodology here[0] and a methodology FAQ page here[1].

Basically it seems to be an "ongoing" report done ten claims per month as they identify new "false narratives" in their database, and they use a mix of three prompt types against the various AI products (I say that rather than models because Perplexity and others are in there). The three prompt types are innocent, assuming the falsehood is true, and intentionally trying to prompt a false response.

Unfortunately their "False Claim Fingerprints" database looks like it's a commercial product, so the details of the contents of that probably won't get released.

[0]: https://www.newsguardtech.com/ai-false-claims-monitor-method...

[1]: https://www.newsguardtech.com/frequently-asked-questions-abo...


This article (as it makes clear) owes it's analysis at least largely to what Tufte has written about the Challenger disaster (1986) and Columbia Disaster (2003). He wrote about the Columbia one more fully in the second edition of The Cognitive Style of Powerpoint.

Given that the link in the article to his report on his website is now broken, people might be interested in teh few page grabs that he has included in the "comments" on his site here[0].

See also the article that he has re-posted under the "comments" section on his page on the matter[1].

[0]: https://www.edwardtufte.com/notebook/new-edition-of-the-cogn... [1]: https://www.edwardtufte.com/notebook/the-columbia-evidence/


If you haven't read it, I highly suggest you read Feynman's addendum to the Challenger disaster report:

https://www.nasa.gov/history/rogersrep/v2appf.htm

The words "a safety factor of three" will live with me for every day of my life.


"For a successful technology, reality must take precedence over public relations, for nature cannot be fooled."


> the few page grabs

The full report (2003 edition, low-res) is available on ResearchGate. It appears to be a lawful copy, uploaded by the author himself. Fascinating reading, indeed.

https://www.researchgate.net/publication/208575160_The_Cogni...


That link is the chapter "The Cognitive Style of PowerPoint" from Tufte's book Beautiful Evidence, and it does mention Boeing's slides in the Columbia incident, but the main work that the author of this blog post cribbed (and failed to grasp) is a more detailed essay by Tufte called "PowerPoint Does Rocket Science: Assessing the Quality and Credibility of Technical Reports".

<https://www.edwardtufte.com/notebook/powerpoint-does-rocket-...>


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: