More

hyperpape · 2026-05-31T20:33:28 1780259608

I confess, this is very funny and the underlying situation is a bit absurd, but it's unclear what point Brouwer is making by pointing out the absurdity.

There surely is something absurd about having to register specific processes as exempt from the OOM killer. But given that the OOM killer exists, and could kill xlock...how should that be fixed?

kelnos · 2026-06-01T02:11:21 1780279881

I think part of it is that the design of screen lockers on X11 is just broken. If the locker crashes (or is killed), then the screen unlocks. Security-wise, it fails open. On Windows and macOS (and Wayland, using the ext-screen-lock protocol, coupled with sane compositor policy), that can't happen.

The right way for this to work is for the X server to have an extension that lets a screen locker say "hey, I'm locking the screen now", and the X server should respond to that by pretending that the screen locker client is the only client that exists: no other client gets input or gets to draw. And if the screen locker crashes (or is killed), the X server should just put itself into a permanently-locked state where it will never again send any input to anything, and won't ever draw anything except a blank screen. That's not a desirable situation, of course, but it's better than unlocking the screen.

hyperpape · 2026-06-01T11:08:56 1780312136

Admittedly, that's right, and makes sense for that use case. But as others have pointed out, killing the user's web browser while they're using it is equally painful.

ameliaquining · 2026-05-31T21:15:15 1780262115

I read him as arguing that overcommit was a mistake. Of course, he doesn't answer any of the obvious follow-up questions, such as, does fork–exec copy all the process's memory and then immediately throw it away, or what. (One could argue that fork–exec was also a mistake, but it long predates Linux, so this doesn't answer the question of how Torvalds should have designed it.)

wahern · 2026-06-01T15:45:40 1780328740

> does fork–exec copy all the process's memory and then immediately throw it away, or what

No, you just account for it (commit the charge) in the bookkeeping. If a 1GB process forks, you decrement the amount of free memory by 1GB to ensure other processes don't overcommit such that you won't have 1GB of free memory if and when you actually needed to allocate that memory. If the forked process immediately exits, you just bump the free memory counter back up. This is what Solaris and Windows do.

But precise accounting of memory is difficult if you didn't design for it in the first place. For example, you have to figure in the memory needed for page structures. (Though I think Linux can do that in particular, bugs notwithstanding.) Last time I checked (5+ years ago) Linux was incapable of such precise accounting across the board, so even if you disabled overcommit the kernel could still find itself in an OOM situation when the time comes to allocate memory it already promised or perform an operation it implicitly or explicitly guaranteed it could complete.

The expectation that Linux overcommits meant many Linux kernel developers didn't design subsystems in a way that the kernel as a whole could provide reliable, guaranteed, precise memory accounting. For example, some filesystems rely on being able to use the OOM killer to free up memory needed for an operation that it can't back out of once it starts because it wasn't written in a way that it could either predetermine or bound it's memory requirements, or cleanly back out of an operation it started.

To be fair I'm not sure any of the BSDs can do it either, at least when it comes to fork and CoW. IIRC, nor can macOS, though it will dynamically add swap so you won't get an OOM kill until you run out of disk space.

ameliaquining · 2026-06-01T17:15:46 1780334146

Well, Windows doesn't have fork–exec so there's no problem with a 15 GB process spawning a 15 MB subprocess. Whereas doing that on Linux without overcommit requires there to be 15 GB free. vfork and posix_spawn work around this, but lots of existing code doesn't use them, vfork is notoriously hard to use correctly, and posix_spawn doesn't (and doesn't try to) cover all fork–exec use cases.

wahern · 2026-06-01T17:40:09 1780335609

NtCreateUserProcess supports copy-on-write fork semantics without overcommit. See https://github.com/huntandhackett/process-cloning#cloning-fo... And there's a wrapper, RtlCloneUserProcess, that is called using the same traditional fork pattern.

Precise memory accounting and CoW fork aren't intrinsically antagonistic, and the general ability to clone CoW mappings or similar kernel structures is useful beyond fork, which is why NT had all the necessary facilities in the kernel (it's the userspace CRT state that can be tricky, especially in the presence of threads, which is true on Unix systems as well).

The example of forking a process with a giant VM space just to exec some other program is, IMO, a straw man. Processes with such huge RW mappings typically don't fork and exec like that. Nobody architecting an app like PostgreSQL was relying on the ability to easily fork processes for minor tasks or exec utilities from processes already forked for resource intensive tasks. And when such a thing is desirable, it's easy enough to use the alternatives, like vfork, or architect a controller for spawning subprocesses, or just use threads. Heck, fork existed long before CoW. Expectations around fork, that you can and should be able to call it without any forethought about resource management was a consequence of Linux' popularity.

Linux embraced overcommit because people wanted to run existing big iron applications like networked databases on tiny PCs with fractions of the memory those applications were written to expect to be able to use. Overcommit was a hack that let your play around with those applications without them immediately falling over, partly because back then such applications often preallocated memory for cache, etc, but would never use all of it when running in an environment like early Linux, which would never see the same high loads and utilization as big iron servers.

Linux could have pivoted in the other direction and pursued strict memory accounting with the ability to expressly overcommit in, e.g., some process subtrees or dynamically allocate swap (which in the expected scenario it normally wouldn't have to actually do). But like most userspace developers they found it easier to write kernel code when they could pretend memory was infinite, and when the system hit the wall just blow up and blame the user. That choice can be defensible for userspace, but it's simply not defensible for a kernel.

jkrejcha · 2026-06-02T00:33:10 1780360390

To be 100% fair, it's rare that processes are cloned on Windows, if only because it's part of the Native API that applications generally don't use directly, and CreateProcess is easier and does all the housekeeping stuff, etc, that people writing Windows applications generally come to expect (or don't even know happens)

I do think overcommit was a poor design choice, but I think it probably mostly does logically follow from the fact that fork and friends are the only ways available to create a process that's available to userspace. It's quite unfortunate though.

Part of the problem is that some applications wanted to reserve lots of address space but didn't necessarily want to touch it right away (such as when they were using it sparsely). Something that VirtualAlloc(x, MEM_RESERVE) (or mmap(..., MAP_NORESERVE)) would be suited for. But while malloc exists, mreserve doesn't in libc, and I think it was pretty uncommon to use it.

zinekeller · 2026-06-01T02:53:47 1780282427

> does fork–exec copy all the process's memory

NT: Yes? Why not?

(note that this refers to the Windows NT kernel's operation because it had historically a POSIX emulation layer (NT Personalities), not the modern WSL which is just Linux in a Hyper-V)

adgjlsfhk1 · 2026-06-01T04:13:35 1780287215

because this is what causes Windows to use ~80% more memory than unixes

magicalhippo · 2026-06-01T06:38:58 1780295938

Well, in that case it's a good thing I guess. Windows is orders of magnitude better when it comes to memory management on the desktop compared to Linux. Like why would I even want a single process killed by OOM killer? On Windows things just work, or get slow. On Linux it works and then mayhem ensues.

Last year I was writing a reply on a forum in Firefox on Linux when the OOM killer decided to nuke Firefox. Poof gone, mid keystroke. How does anyone think that's acceptable?

This was on a stock Linux distro, nothing special.

ChocolateGod · 2026-06-01T09:21:21 1780305681

> Windows is orders of magnitude better when it comes to memory management on the desktop compared to Linux.

The bar is pretty low, but the windows scheduler is aware what the currently focussed app is so it can prioritise not killing it.

On Linux? Not so much.

zinekeller · 2026-06-01T11:43:12 1780314192

Actually, it depends on the Windows scheduler settings. On Windows Server, the default is to kill the foreground process (on the assumption that it is just a management app rather than a critical server component).

magicalhippo · 2026-06-01T11:57:04 1780315024

In either case, Windows tries a lot of things to avoid killing processes. Which at least in a desktop setting is an infinitely better approach than random beheadings without warning.

adgjlsfhk1 · 2026-06-01T17:14:38 1780334078

yeah. a lot of the issue with Linux's approach is that until recently, the kernel was the one making the choice, and it doesn't know which processes matter. The part Linus does a lot better if not getting to oom in the first place (and with the newish compressed ram stuff is getting even better)

jkrejcha · 2026-06-01T15:28:58 1780327738

Windows doesn't use fork/exec for process creation in any relevant way today

There are Native APIs for implementing fork (needed for the obsolete POSIX subsystem, primarily), but even on the Native API side, processes are usually spawned through NtCreateProcess or RtlCreateUserProcess, though there is a bunch of setup with regards to the Csr APIs for the Win32 CreateProcess[1]).

[1]: https://stackoverflow.com/a/69605729/2805120

tosti · 2026-06-01T05:05:16 1780290316

Processes are usually spawned with CreateProcess. There's no fork in win32.

silon42 · 2026-06-01T06:18:48 1780294728

Fork should be replaced by vfork (or something better) in almost all situations.

dooglius · 2026-05-31T21:04:42 1780261482

The point is that the OOM killer shouldn't exist and arguing about how to tweak it is addressing the wrong problem

hackyhacky · 2026-05-31T21:27:06 1780262826

I agree that that's the point he's making, but I don't see how that would work practically. His attitude is that malloc(1<<63) should immediately crash the system, every time? How is that better?

cpgxiii · 2026-05-31T21:45:20 1780263920

No, if a process allocates an infeasible amount, malloc fails and the process needs to deal with the failure (which is what already happens, "malloc doesn't fail on Linux" is only really true for smaller-than-page-size allocations). The point being made is that the system should account conservatively for all memory that can be used, not just the optimistic underestimate that overcommit enables (i.e. the plane should always carry enough fuel for contingencies, and landing with extra fuel is a good outcome).

StilesCrisis · 2026-06-01T13:51:05 1780321865

You never need to crash the system if you remove overcommit. You just crash the one process. Practically speaking, you don't even need to crash here; you just return null (which malloc is always free to do) and let the consequences speak for themselves.

jkrejcha · 2026-06-02T00:36:12 1780360572

malloc can just return NULL (in specific, mmap returns -ENOMEM and your libc translates that). Applications need to check for success anyway

hyperpape · 2026-05-31T21:49:41 1780264181

But the second clause doesn't follow from the first!

I don't think Linux was plausibly going to remove the OOM killer in 2004 or later. So the right solution for Linux is very much to tweak it to be less painful.

sankhao · 2026-06-01T03:52:11 1780285931

I also think the analogy doesn't work. In the plane situation it seems obvious that the luggage should be ejected before passengers, which is what the guy was asking ?

fragmede · 2026-06-01T04:45:07 1780289107

The analogy doesn't work because you can't call fork() on the plane and then it duplicates just the seat for the passenger or pilot that did something different. Also, killing them rather ghastly.

hyperpape · 2026-05-30T23:28:29 1780183709

> Has there been ongoing, persistent attacks by AI on domain expertise where we can say the moat holds, economically speaking? So far it seems quite the opposite.

What do you mean by this? Most human white collar workers still have their jobs. I can't see the future, but yes, so far, human expertise is doing ok.

We'll see what happens in 2027, and 2028, and...

hyperpape · 2026-05-30T22:11:59 1780179119

Like so many posts that end up on HN, I just want to say "you've got a decent idea, but tone it the fuck down."

It's absolutely true that domain knowledge is incredibly useful, and developers aren't always great at gaining it. But there's also something about decomposing systems into their component parts, understanding algorithms, and knowing how code works that's also incredibly useful, even with agents in the picture. A really good developer needs both of those skills.

Take that example, of the generated shift that's illegal (by coincidence, I do freight optimization and work with examples like that in my day job). A domain expert will know the specific example is illegal. So they'll tell the agent to fix it. The agent will probably fix it for that case.

How does the domain expert then know that the agent has produced a thorough fix, as opposed to just that scenario? Not because the agent says so. So it is because they test it manually (but which cases)? Or because they review the strategy of the agent's tests, and know how the algorithms work, and know the edge cases that the tests need to cover? But they can't do that, by stipulation, because they're not experienced with code, they're just using the agent.

So yes, if the agent gets to the point where it can design robust software that avoids edge cases in a complex domain, doing complex operations and is thoroughly tested, and so on, then half of my skills are going to be irrelevant.

Out of the box, agents don't do that today. Perhaps they'll get to that point, but until then, your knowledge of where to put a semicolon has become less useful, but your ability to specify and test processes precisely has not.

But yeah, knowing your domain well is a damn good idea.

hyperpape · 2026-05-28T17:23:33 1779989013

They will release a system card, and you can then confirm or disconfirm your assumptions.

hyperpape · 2026-05-28T15:11:29 1779981089

"Most purely cognitive labor is automatable"

I cannot express how annoyed I am a researcher could use such a shitty definition.

It only makes sense to say "most" if you have a clear idea of what constitutes the majority. "Most people are male" yeah, fine..50% + epsilon of humans are males. That's more or less decidable (maybe a little vague because of intersex folks). I believe it's false because there are slightly more females but it's obviously measurable.

Now, most cognitive labor...what does that mean? Is it most of the time? Most of the tasks? Most of the value? Most of the job descriptions?

If I am a developer, and the majority of my code is written by AI, but I'm still in the driver's seat, is that most of my cognitive labor? Probably not. Ok, what if my company fires 60% of its developers, does that mean most development cognitive labor is automated? Well, it's most of the expense, and most of the butt in chair time, and it's most of the individual jobs, but it's not most of the job descriptions.

Of course, there's no way that all these researchers making pronouncements are giving consistent answers to what they mean by "most". They're probably not using his phrasing either.

Edit: The four options I threw out above: time, tasks, value, job descriptions are each interesting in their own way. My point is not that they're bad questions to be asking, it's that they're all separate questions that matter in different ways.

knivets · 2026-05-28T15:27:45 1779982065

Some big names in AI made predictions by pulling random dates based on vibes, the author collected this and called this data.

ddp26 · 2026-05-28T15:25:37 1779981937

What's a definition of AGI you would use, for either time, tasks, value, or job descriptions?

hyperpape · 2026-05-28T15:52:20 1779983540

No one has to provide a definition to argue that your definition is inadequate.

th4tth4ng · 2026-05-28T15:42:12 1779982932

You can make them separate questions rhetorically but it doesn't mean they need to followed up on as such. It's pretty simple...

Most of the time? Well it includes the word most, so yes.

Most of the tasks? Well it includes the word most, so yes.

Such is a common way of writing. Think of it as a kind of compression. Researchers consider the rhetoric more than you want to give them credit for with your knee jerk "I don't personally understand so these researchers are idiots" ad hominem.

Models do contain a mathematical happy path to answer most questions that have been asked and answered when the model was trained. The issue is not whether those answers exist but finding them. That's what the bulk of the bleeding edge of model work is focused on atm.

hyperpape · 2026-05-22T14:05:25 1779458725

From my perspective, and the perspective of most academics[0], it is their contribution to human knowledge, which is kept locked up by predatory publishers.

A majority of academics will simply and without hesitation, offer their students and collaborators pirated versions of their own work, because they value knowledge.

Commercial authors may feel differently.

[0] I'm a former Ph.D. student, but my attitude was the same both within and outside of the academic world.

hyperpape · 2026-05-20T14:31:48 1779287508

Heart health isn't the only mechanism by which exercise could affect mortality.

hyperpape · 2026-05-07T12:38:51 1778157531

Concretely, it has to decide whether it is in a circumstance where that skill is useful, pull the instructions into the context and follow them.

cassianoleal · 2026-05-07T14:17:25 1778163445

Yep, and as with any other instructions, it can sometimes not pull the skill even if the trigger conditions are there.

hyperpape · 2026-05-03T23:33:23 1777851203

> we must assume that the best AI models (especially ones focusing solely in the medical field) would largely beat large majority of humans (aka doctors), if we already have this assumption for software engineers, we should have it for this field as well,

This is a pretty wild leap. Code has a lot of hooks for training via hill-climbing during post-training. During post-training, you can literally set up arbitrary scenarios and give the bot more or less real feedback (actual programs, actual tests, actual compiler errors).

It's not impossible we'll get a training regime that does the "same thing" for medicine that we're doing for code, but I don't know that we've envisioned what it looks like.

DrewADesign · 2026-05-04T00:34:24 1777854864

Code is pretty much the perfect use case for LLMs… text-based, very pattern-oriented, extremely limited complexity compared to biological systems, etc.

I suspect even prose is largely considered acceptable in professional uses because we haven’t developed a sensitivity to the artifice, and we probably won’t catch up to the LLMs in that arms race for a bit. However, we always manage to develop a distaste for cheap imitations and relegate them to somewhere between the ‘utilitarian ick’ and ‘trashy guilty pleasure’ bins of our cultures, and I predict this will be the same. The cultural response is already bending in that direction, and AI writing in the wild— the only part that culturally matters— sounds the same to me as it did a year and a half ago. I think they’re prairie dogging, but when(/if) they drop that bomb is entirely a matter of product development. You can’t un-drop a bomb and it will take a long time to regain status as a serious tool once society deems it gauche.

The assumption that LLMs figuring out coding means they can figure out anything is a classic case of Engineer’s Disease. Unfortunately, this hubris seems damn near invisible to folks in the tech industry, these days.

SirHumphrey · 2026-05-04T05:56:44 1777874204

And with the code, the closer you come to the physical world the worse LLMs fair.

Claude can’t really write Openscad and when I was debugging some map projections code last week it struggled a lot more than usual.

prplxd_nihilist · 2026-05-04T07:05:38 1777878338

Until anthropic hire or steal code from acquired companies and train with it.

DrewADesign · 2026-05-04T16:29:41 1777912181

I think that might help a little, but is not a solution. When you’re figuring out some new way to combine code instructions to perform novel coding tasks, you’re just finding new configurations for existing patterns to get results you can easily test. The world outside of computers is infinitely more complex, random, and novel.

sdwr · 2026-05-04T00:15:55 1777853755

Emergency medicine is the coding of medicine. Fast feedback loop, requires broad rather than deep judgement, concrete next steps.

The AI coding improvement should be partially transferrable to other disciplines without recreating the training environment that made it possible in the first place. The model itself has learned what correct solutions "feel like", and the training process and meta-knowledge must have improved a huge amount.

dghlsakjg · 2026-05-04T01:03:11 1777856591

I would argue that the ED is the least similar to code. You have the most unknowns, unreliable data and history, non deterministic options and time constraints.

An ER staff is frequently making inferences based on a variety of things like weather, what the pt is wearing, what smells are present, and a whole lot of other intangibles. Frequently the patients are just outright lying to the doctor. An AI will not pick up on any of that.

TurdF3rguson · 2026-05-04T01:26:47 1777858007

> An AI will not pick up on any of that.

It will if it trains on data like that. It's all about the training data.

n8henrie · 2026-05-04T01:53:01 1777859581

Unfortunately the training data is absolute garbage.

Diagnostic standards in (at least emergency, but I think other specialties) medicine are largely a joke -- ultimately it's often either autopsy or "expert consensus."

We get to bill more for more serious diagnoses. The amount of patients I see with a "stroke" or "heart attack" diagnosis that clearly had no such thing is truly wild.

We can be sued for tens of millions of dollars for missing a serious diagnosis, even if we know an alternative explanation is more likely.

If AI is able to beat an average doctor, it will be due to alleviating perverse incentives. But I can't imagine where we could get training data that would let it be any less of a fountain of garbage than many doctors.

Without a large amount of good training data, how could AI possibly be good at doctoring IRL?

TurdF3rguson · 2026-05-04T04:29:53 1777868993

You just get 1M doctors to wear body cams for a year. Now you have a model that has thousands of times your experience with patients, encyclopedic knowledge of every ailment including ones that never present in your geography, read all the latest papers, etc..

I don't understand how you think this doesn't win vs a human doctor.

davycro · 2026-05-04T05:54:55 1777874095

This wouldn't solve the problem of diagnostic standards. Let's say you are a pediatrician and want to predict which kids with bronchiolitis will develop respiratory failure and need the ICU versus the ones who can go home. How do you determine from the body cams which kids had bronchiolitis in the first place? Bronchiolitis is a clinical diagnosis with symptoms that overlap with other respiratory illnesses such as asthma, bacterial pneumonia, croup, foreign body ingestion, etc.

TurdF3rguson · 2026-05-04T08:03:04 1777881784

you would have footage of the doctors diagnosing them. I don't understand what you're asking. The body cams have microphones too in case that wasn't clear.

xarope · 2026-05-04T05:32:57 1777872777

In healthcare, HIPAA/GDPR equivalent would block this. Let's be realistic in our discussion; this is not the same as google buying up a library worth of books, scanning and destroying them

TurdF3rguson · 2026-05-04T08:01:00 1777881660

There are other countries, and the patients in them all have similar data

notahacker · 2026-05-04T10:29:23 1777890563

Other countries actually don't necessarily have a similar mix of ailments, median patient appearance and style of communication or even recommended course of action and most of the ones with more sophisticated medical care also have strict medical privacy laws. If you're genuinely unaware of this, I'm not sure you're in a position to be making "one year with a camera, how hard can it be" arguments...

(Where AI is likely to actually excel in medicine is parsing datasets that are much easier to do context free number crunching on than ER rooms, some of which physicians don't even have access to ...)

TurdF3rguson · 2026-05-04T23:15:22 1777936522

I think you're being silly if you think the amount of money at stake here, not the mention the health of billions of people is going to be stymied by privacy laws.

n8henrie · 2026-05-07T00:16:39 1778112999

Similar data?!

We have wildly heterogeneous data just within the US!

And again, how exactly is this interface going to work? How does the AI determine how hard to press on an abdomen, and where, and how does it press there once it has that information?

n8henrie · 2026-05-04T13:23:55 1777901035

How is training on bad data going to give you better results than the current system?

What kind of embedding helps the AI learn to do a physical exam?

Not to mention patient privacy, I can't even take a still photo of a patient in my current system (even with a hospital-owned camera).

mrbungie · 2026-05-04T01:47:32 1777859252

The user will be adversarial and probably learn new tricks to trick the machine, this is not solvable (only) via training data.

bonesss · 2026-05-04T05:15:33 1777871733

We have that expression “garbage in, garbage out.

My sense is that doctors and AI would be doing a lot better if they were just doing medicine, not being a contact surface for failures of housing, mental health and addiction services, and social systems. Drug seeking and the rest should be non-issues, but drug seekers are informed and adaptive adversariesz

zbentley · 2026-05-05T04:25:02 1777955102

To give this more credit than it perhaps deserves: training aside, getting the situational data into the context is a more significant problem here.

Pt's chart is complex/wrong? Gotta ingest that into context.

Chart contains images/scanned and not OCR'd text? Gotta do an image recognition pass.

Diagnosis needs to know what the pt's wearing (i.e. radiation badge)? Gotta do an image recognition pass.

Diagnosis needs to know what the weather's like? Internet API access of some kind. Hope the WAN/API are all working! If they're not, do you fail open or closed?

Patient might be lying? Gotta do video/audio analysis to assess that likelihood--oh, and train a model that fully solves one of the holy grails of computer vision/audio analysis reliably and with a super low false-positive rate before you do. And if it guesses wrong, enjoy the incredibly easy-to-prosecute lawsuit.

Patient might be lying, but the biggest clue is e.g. smell of alcohol on their breath? Now you need some sort of olfactory sensor kit and training for it--a lot more than just "low quality body cam and a mic".

Patient's ODing on a street drug that became abundant in the last few months? Gotta somehow learn about recent local medical/police history that post-dates the training set, or else you might be pouring gas on a fire if you give them Narcan. And that's assuming you know enough to search for information about that drug, and that they didn't lie to you about what they took. Addicts never do that.

Failures in each of those systems bring down the chance of an effective diagnosis, so they need a fairly obsessive amount of model introspection/thinking/double-checking, and humans on standby as a fallback if the AI's less than confident (assuming that LLMs can be given a sense of a confidence level in the future, versus the current state of the art of "text-predict a guess about what your confidence level might be").

Put that all together, and even with the AI compute speed available years from now and a perfectly trained futuristic model that's preternaturally good at this stuff, I'm not sure that that the reliability and, more importantly, the turnaround time of that diagnostic pass is going to be any good compared to a human ER doc.

hyperpape · 2026-05-02T15:59:35 1777737575

I'll copy what I wrote on LinkedIn (note: I read roughly 25 pages, which is half the paper, and read it quickly)[0]:

"If I read the paper correctly, they don’t actually show that LLMs prefer resumes they generate.

Their actual method seems to be taking a human written resume, deleting the executive summary, having an LLM rewrite the executive summary based on the rest of the resume and then having another LLM rate the executive summary without the rest of the resume.

That’s likely to massively overstate any real impact, if you can even rely on it capturing a real effect.

I really wonder if I read that correctly, because I can’t come up with a justification for that study design."

[0] I couldn't help but mildly copy-edit before pasting here.

Edit: yes, the authors present a reason for their design, and an ideal version of my comment would've said that. I do not consider it much of a justification. See below: https://news.ycombinator.com/item?id=47987256#47987727.

b112 · 2026-05-02T16:51:22 1777740682

Could be an ad for 'use LLMs more'. A generic ad like this helps all in the market, but if you own 30% of LLM market share, it still helps you 30% of the time.

Now that I think of it, every other industry has an 'advocacy group', whether cheese, oil, or nutmeg. So surely there is now some sort of LLM 'consortium', and group funding studies like this just fuels the FOMO. You can be sure such groups exist, and are pummeling every government in the world thusly. But I bet they're also looking here.

After all, it's a circle. Uh-oh! HR is using LLMs, you'd better too potential employee! Then later? Uh-oh! The best employees you can hire are using LLMs, you'd better too HR!

They already FOMOed us into basically everything else, why not LLMs too?

delusional · 2026-05-02T16:12:38 1777738358

[flagged]

aDyslecticCrow · 2026-05-02T16:21:34 1777738894

There is some creativity in the rest of the CV, between what kind of experiences are included and how they are described. But that would be far harder to generate fairly.

In think choosing the summary is a fair design choice since it prevents the LLM from just... making up a perfect candidate.

"I'm a fullstack professor of software design with 90 years of experience expecting a junior internship position"

nearbuy · 2026-05-02T16:21:05 1777738865

I assume they meant they can't come up with a reasonable justification.

hyperpape · 2026-05-02T17:09:59 1777741799

Thank you, that's correct.

To be perfectly clear, I understand their justification for only _editing_ the executive summary, it is arguably reasonable, because editing the work history would risk altering the details in ways that compromise the measurement. This is a hard problem to solve (you might try reviewing the resumes for hallucinations, but I can't think of a precise study design that doesn't risk problems).

What is, imho, impossible to defend, is having the LLM only evaluate the executive summary in isolation, and reporting that as it preferring resumes it wrote.

What you've shown is that LLMs prefer executive summaries they wrote. But the overall impact on how they will evaluate your entire resume is not measured by this technique.

Worse, this isn't just "decent paper, bad summary", their abstract misreports their findings.

delusional · 2026-05-02T20:12:24 1777752744

> Worse, this isn't just "decent paper, bad summary", their abstract misreports their findings.

What findings are being misrepresented? Their claims seem supported by their conclusions to me. You can question the generality of their claims based on the limitation of their methods, but that does not amount to "misreporting" the conclusion.

delusional · 2026-05-02T17:04:53 1777741493

I doubt it since they, admittedly, didn't read it. The question he posed, about the paper, is answered in that very same paper. He has structured his whole reply to have the tone of uncovering the hidden caveat in the small print that invalidates the paper, when it's actually a straightforwardly stated assumption in their methodology section.

lunchbucket · 2026-05-02T18:42:24 1777747344

Now that they've confirmed that was in fact what they meant, how have your views on this exchange changed?

delusional · 2026-05-02T20:02:04 1777752124

> how have your views on this exchange changed?

Not at all, because I am critiquing the authors writings, and for those I don't need to speculate on his intentions. He wrote a comment where he misrepresents the arguments in the paper, while explicitly saying he didn't bother to read it. That's not good enough.

The author of said comment now comes in, after getting criticized, and claims that "yes, I meant that all along" and appends a note about not considering it "much" of a justification. He did not question the justification of the paper, his claim was "I can’t come up with a justification" implying the paper has NO justification for the design. His criticism of the abstract as not covering the design of the experiment rings hollow when he can't be bothered to read the paper itself.

That being said, I am happy that he went back and read the justification, and I do think it's valid to question the conclusions drawn from the design of the study. I too wonder if this result would replicate had the models been provided the entire resume. I too think presenting the model with the entire reconstructed resume would have been a stronger test.

hyperpape · 2026-05-02T22:32:36 1777761156

I very specifically said I read 25 pages of it in the first post of this thread. I didn’t go back, I haven’t looked at the paper since yesterday.

I read their methods and their explanation and judged them to be lacking.

The fact is, they did not measure that LLMs prefer LLM authored resumes, but that is what their paper stated.

They measured that LLMs prefer LLM authored executive summaries, which is a weaker claim.

delusional · 2026-05-02T23:08:10 1777763290

The references start at page 28, and then the rest is appendices. If you'd read those last 3 pages you could say you'd read it all, and then maybe you could have an opinion about it.

You have to separate those two issues though. You spew out an opinion about a paper you haven't read. That's bad no matter what your opinion is. Don't blast your opinion out into the world if you haven't bothered to actually think about it first. That's one issue. A second issue is then that I think your opinion, that you didn't read the paper to make up, is wrong. They have in fact provided a justification, you just don't feel like it counts. I decided to join those two, because not reading the justification would explain why you didn't believe they had one, but that coupling isn't necessary.

They did still provide a justification, you said they don't. That's wrong. Now you're saying that you don't find it convincing, that's perfectly OK, but you then extend that claim into an accusation of misreporting. That's where you go off the rails again. They are accurately reporting what they have observed and concluded. They have provided justification for that conclusion, and all of that is, to my eyes, reported accurately.

The reason you can, accurately and correctly, claim

> They measured that LLMs prefer LLM authored executive summaries

Is exactly because their paper accurately states what they are measuring and why they believe the conclusion extends to a more general claim. You treat it as though it's some bombshell discovery, but they tell you, right in the fucking text.

If I had to revise me opinion, I guess I'd say I now no longer believe you didn't read the paper, but instead that you don't know HOW to read scientific papers.

lunchbucket · 2026-05-03T04:40:23 1777783223

> If I had to revise me opinion, I guess I'd say I now no longer believe you didn't read the paper, but instead that you don't know HOW to read scientific papers.

This is the kind of opinion you ought to keep to yourself, because it's inflammatory and uninteresting. There's no discussion to be had about your views on their competence. Downvote comments you think are bad without centering some other person's alleged failings in the conversation.

ekianjo · 2026-05-02T16:20:43 1777738843

> They state that unlike the rest of the resume, which is largely factual

largely factual? A resume is usually more than a bunch of dates and titles of positions.