Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What is the privacy implication of AI training?


I care much more about allowing my content to be used at all, despite any privacy concerns. I simply don't want one single AI model to train on my content.


[flagged]


Perhaps in response to environmental regulations to prevent toxic waste from being dumped into your own backyard, you should respond, "create your own country, then".


You really think creating a country and creating software that respects your privacy are equally difficult?


Equally difficult, no. Equally important in principle, yes.


for someone that prefers to complain over solving the issues at hand, yes.


I wrote my own software. Turns out LLMs are still training on my data.

What’s the next step?


Poison the well with AI SEO? There must the equivalent to parrots for NN that can be embedded in documents.


Host it on a private git instance.


That's like saying if you don't like ransomware just develop your own.


Why would I write ransomware for myself?


Brilliant!


Maybe the correct response is to burn down their office and if they don’t like it they can create their own data


Just wait until you hear about Copilot :D


Models can easily regurgitate back training data verbatim, so anything private can be in theory accessed by anyone without proper access to that file


This is partly true but less and less every day.

IMO the bigger concern is that this data is not just used to train models. It is stored, completely verbatim, in the training set data. They aren’t pulling from PDFs in realtime during training runs, they’re aggregating all of that text and storing it somewhere. And that somewhere is prone to employees viewing, leakage to the internet, etc.


> This is partly true but less and less every day.

Isn't this like encryption, though?

I'm fairly sure that the cryptography community basically says: if someone has a copy of your encrypted data for a long time, the likelihood over time for them to be able to read it approaches 100%, regardless of the current security standard you're using.

Who could possibly guarantee that whatever LLM is safe now will be safe at all times over the next 5-10-20 years? And if they're guaranteeing, they're lying.


I think it’s different, unless you believe LLMs have broken theoretical limits on compression. I don’t see how an LLM with 1T 16 bit parameters could encode 100PB of data.


My point was about attack angles. The original comment said, that for example, you could exfiltrate data with the right prompt attack.

To which the reply was "they'll just make the LLM able to better defend itself".

And my point was "the attackers will learn to build better prompts, too".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: