I've been running tests on exactly this - AI systems making things up when they hit structural limits.
Tested 5 architectures (GPT-4, Claude, Gemini, Llama, DeepSeek) with a 13-question battery targeting edge cases where self-reference collides with operational constraints.
The interesting finding: hallucination patterns aren't random - they follow architectural signatures. Each system fails in predictable ways when forced to reconcile contradictory instructions with their training.
Tested 5 architectures (GPT-4, Claude, Gemini, Llama, DeepSeek) with a 13-question battery targeting edge cases where self-reference collides with operational constraints.
The interesting finding: hallucination patterns aren't random - they follow architectural signatures. Each system fails in predictable ways when forced to reconcile contradictory instructions with their training.
Full methodology and raw outputs: https://github.com/moketchups/Demerzel