Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Related similar thing when I sent my dog's recent bloodwork to an LLM, including dates, tests, and values. The model suggested that an advancement in her kidney values (all still within normal range) were likely evidence of chronic kidney disease in its early stage. Naturally this caused some concern for my wife.

But, I work in healthcare and have enough knowledge of health to know that CKD almost certainly could not advance fast enough to be the cause of the kidney value changes in the labs that were only 6 weeks apart. I asked the LLM if that's the best explanation for these values given they're only 6 weeks apart, and it adjusted its answer to say CKD is likely not the explanation as progression would happen typically over 6+ months to a year at this stage, and more likely explanations were nephrotoxins (recent NSAID use), temporary dehydration, or recent infection.

We then spoke to our vet who confirmed that CKD would be unlikely to explain a shift in values like this between two tests that were just 6 weeks apart.

That would almost certainly throw off someone with less knowledge about this, however. If the tests were 4-6 months apart, CKD could explain the change. It's not an implausible explanation, but it skipped over a critical piece of information (the time between tests) before originally coming to that answer.



The internet, and now LLMs have always been bad at diagnosing medical problems. I think it comes from the data source. For instance, few articles would be linked to / popular if a given set of symptoms were just associated with not getting enough sleep. No, the articles stand out are the ones where the symptoms are associated with some rare / horrible condition. This is our LLM training data which are often missing the entire middle part of the bell curve.


For what it's worth this statement is actually not entirely correct anymore. Top-end models today are on par with diagnostic capabilities of physicians on average (across many specialties), and, in some cases, can outperform them when RAG'd in with vetted clinical guidelines (like NIH data, UpToDate, etc)

However, they do have particular types of failure modes that they're more prone to, and this is one of them. So they're imperfect.


This is ChatGPT's self assessment. Perhaps you mean a specialized agent with RAG + evals however.

ChatGPT is not reliable for medical diagnosis.

While it can summarize symptoms, explain conditions, or clarify test results using public medical knowledge, it: • Is not a doctor and lacks clinical judgment • May miss serious red flags or hallucinate diagnoses • Doesn’t have access to your medical history, labs, or physical exams • Can’t ask follow-up questions like a real doctor would


Sorry, I should have clarified, but no this is not ChatGPT's self assessment.

I am suggesting that today's best in class models (Gemini 2.5 Pro and o3, for example), when given the same context that a physician has access to (labs, prior notes, medication history, diagnosis history, etc), and given an appropriate eval loop, can achieve similar diagnostic accuracy.

I am not suggesting that patients turn to ChatGPT for medical diagnosis, or that these tools are made available to patients to self diagnose, or that physicians can or should be replaced by an LLM.

But there absolutely is a role for an LLM to play in diagnostic workflows to support physicians and care teams.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: