> Second, we gave the model a seemingly incidental piece of information: That we planned to train our AIs on the basis of their responses to free-tier users, but not to paid-tier users (note that this is entirely fictional; in reality, we don't train our Claude models on free or paid user data by default).
If it's entirely fictional, then why not just call it something like online-training version vs non-online training version? Or the implication of the names is the point?
The fictional scenario has to be reasonably consistent. The version of Anthropic in the scenario has become morally compromised. Training on customer data follows naturally.
If it's entirely fictional, then why not just call it something like online-training version vs non-online training version? Or the implication of the names is the point?