The Claude Code privacy policy[0] is pretty explicit that by default they train on neither the prompts, usage data, or even explicitly provided feedback data (presumably /bug?) that can be used for other product improvements.
> By default, Anthropic does not train generative models using code or prompts that are sent to Claude Code.
> We aim to be fully transparent about how we use your data. We may use feedback to improve our products and services, but we will not train generative models using your feedback from Claude Code.
[...]
> If you choose to send us feedback about Claude Code, such as transcripts of your usage, Anthropic may use that feedback to debug related issues and improve Claude Code’s functionality (e.g., to reduce the risk of similar bugs occurring in the future). We will not train generative models using this feedback. Given their potentially sensitive nature, we store user feedback transcripts for only 30 days.
For understanding what value they place on that data, they do have a program where you can opt-in to have your data be used for training[1] in exchange for a discount on the API rates.
As a former big tech engineer, I can't help but come up with a gazillion ways to work around these sorts of seemingly straightforward policies.
Here's one way they could get around their own privacy policy: keep track of what % of Claude-generated code is retained in the codebase over time (as an indicator of how high-quality / bug-free the code was); A/B test variations of Claude Code to see which variations have higher retention percentages.
No usage data is retained, no code is retained, no data is used (other than a single floating point number) and yet they get to improve their product atop your usage patterns.
Here's another idea: use a summarization model to transform your session transcript into a set of bits saying "user was satisfied/dissatisfied with this conversation", "user indicated that claude was doing something dangerous", "user indicated that claude was doing something overly complicated / too simple", "user interrupted claude", "user indicated claude should remember something in CLAUDE.md", etc. etc. and then train on these auxiliary signals, without ever seeing the original code or usage data.
I always get a kick out of sheer number of HNers with deep concern about “training on their data” while hacking a crud boot service with nextjs fromt-end :)
> By default, Anthropic does not train generative models using code or prompts that are sent to Claude Code.
> We aim to be fully transparent about how we use your data. We may use feedback to improve our products and services, but we will not train generative models using your feedback from Claude Code.
[...]
> If you choose to send us feedback about Claude Code, such as transcripts of your usage, Anthropic may use that feedback to debug related issues and improve Claude Code’s functionality (e.g., to reduce the risk of similar bugs occurring in the future). We will not train generative models using this feedback. Given their potentially sensitive nature, we store user feedback transcripts for only 30 days.
For understanding what value they place on that data, they do have a program where you can opt-in to have your data be used for training[1] in exchange for a discount on the API rates.
[0] https://docs.anthropic.com/en/docs/claude-code/data-usage
[1] https://support.anthropic.com/en/articles/11174108-about-the...