“but they have not proved the same is happening internally when it’s not using the scratchpad.”
This is a real issue. We know they already fake reasoning in many cases. Other times, they repeat variations of explanations seen in their training data. They might be moving trained responses or faking justifications in the scratchpad.
I’m not sure what it would take to catch stuff like this.
Full expert symbolic logic reasoning dump. Cannot fake it, or it would have either glaring undefined holes or would contradict the output.
Essentially get the "scratchpad" to be a logic programming language.
Oh wait, Claude cannot really do that. At all...
I'm talking something solvable with SAT-3 or directly possible to translate into such form.
Most people cannot do this even if you tried to teach them to.
Discrete logic is actually hard, even in a fuzzy form. As such, most humans operate in truthiness and heuristics.
If we made an AI operate in this way it would be as alien to us as a Vulcan.
This is a real issue. We know they already fake reasoning in many cases. Other times, they repeat variations of explanations seen in their training data. They might be moving trained responses or faking justifications in the scratchpad.
I’m not sure what it would take to catch stuff like this.