Coreference resolution tests something like this. You give an LLM some sentence ...

Coreference resolution tests something like this. You give an LLM some sentence like “The doctor didn’t have time to meet with the secretary because she was treating a patient” and ask who does “she” refer to. Reasoning tells you it’s the doctor but statistical pattern matching makes it the secretary, so you check how the model is reasoning and if correlations (“bias”) trump logic.

https://uclanlp.github.io/corefBias/overview