> While the AI Act’s references to copyright issues in generative AI are still very vague and only stress how much of a grey area it is, requiring providers of large models to be more transparent about their sources seems not a bad thing as such. As many aspects of the act it will be seen how this works out in practice.
If my license prohibits use of my work for ai training, or requires that any modified code includes my license or credits, or i lack a license, or my web blog doesnt give you permission to train against my content then you shouldnt use it. Google tried hijacking content with amp and ai is not different from it. If you violate my terms then i want to be able to submit evidence - or suspicion - to a government agency that audits or fines you to oblivion. Ideally you have to pay damages equal to the number of people that you may have sold my content to, in full or partially.
This would lead to a win win setup. Artists, developers, writers, lawyers and so on would need compensation for training content - one time or ongoing - leading to higher quality models, job growth and a superior ai product over all.
Ai is by and large a net positive but needs to be done right.
This feels like (yet another) extension of copyright. Whilst I'm not sure I completely disagree with you, I want people to acknowledge that copyright is not the natural state of the universe. Prior to (I think) 1790 there was no copyright and human beings managed minor things like, you know, the renaissance and stuff like that.
Copyright was invented and enforced and the results have been a mixed bag. It seems to suffer from a ratchet effect where the law only ever increases the scope to which copyright applies and never decreases it.
However intuitive your sense of your moral rights are, it's about the net benefit to society and we should be very careful what we wish for.
If creating LLMs based on copyrighted data is found to be legal, all that will do is allow giant companies to sell copyrighted work without crediting the original authors, while leaving everyone else in the dirt.
> all that will do is allow giant companies to sell copyrighted work without crediting the original authors, while leaving everyone else in the dirt.
I'm not sure I follow. But even accepting your premise - I'm not sure how it will favour giant companies over anyone else. The models are already in the wild and anyone can use them. In some ways - large companies are less likely to do anything that might open them up to legal risks or PR downsides.
Maybe this is more of a Napster moment than it is a big tech powergrab?
GPT is owned by Microsoft, LLama by Facebook and Bard by Google. If you trained a model on google public properties and started distributing it for money (or its output), we'd be sued into oblivion real quick.
My point was that the models exist, people are fine tuning them and/or releasing open clones. There are models of comparable power to the state of the art without any controlling interest from a big tech company.
The Google memo covered this in detail and it was what makes me want to question the "AI is owned by big tech" angle.
Price/profit != value. Sure, Hollywood movies bring in a ton of money, but I get way more value from daily indie youtubers than a blockbuster released once a month.
> Prior to (I think) 1790 there was no copyright and human beings managed minor things like, you know, the renaissance and stuff like that.
Curious if the introduction of copyright is what led to an explosion of products and innovation. Suddenly people were given an incentive to monetize their ideas. I doubt the renaissance happened due to a lack of copyright. I think it's more due to social, political and health circumstances rather than the lack of protection of one's work. We, in Europe, suffered from disease, famine, war, to the point where we reached the conclusion that enough is enough - we need rules to the game.
There doesn’t seem to be evidence that copyright increases innovation. Indeed in some areas with no IP protection we actually see more innovation (example: fashion)
> it's about the net benefit to society and we should be very careful what we wish for.
Seems like we have a classic trolly problem.
On one track, compensating copyright holders is required for LLMs, and it's going to be very expensive to acquire all of this copyrighted info, meaning only the biggest companies can afford to do it.
On the other track, compensating copyright holders is not required, LLMs (led by big tech) capture most of the economic value from every incremental piece of content created by humans in perpetuity, consolidating wealth in the hands of a few shareholders and insiders.
> On one track, compensating copyright holders is required for LLMs, and it's going to be very expensive to acquire all of this copyrighted info, meaning only the biggest companies can afford to do it.
There is also the third track which is that most abundant code is open source or unlicensed content (which is protected in the US afaik). If corporations can't monetize on it, we win, because models either need to be open source or we need payment for training.
I'm not sure it's certain yet with AI is going to lead to more consolidation or actually have the opposite effect.
Whilst history tends to make me suspect the former, the recent leaked Google memo gave me pause for thought. AI is already out there and already can be trained on consumer hardware. It's ever so slightly possible that big tech won't be able to horde the benefits this time.
Open source models are possible if we pick the second option. Lots of innovation in the AI scene is happening thanks to open source models being available to the general public.
If someone wants to learn stuff from your work - they will and there's nothing that you can do about it. I can train a LoRA on someone's artstyle in a few hours on my PC. You can rent a GPU and do it under an hour. It takes more time to curate, process and caption images than to actually train the AI. It's that easy. So yeah - the cat is out of the bag and you will have to adapt.
I can download a whole set of movies even tho the piracy cat’s been out of the bag for a while. But if i monetised it i am in a hell lot of trouble. My debate points are not about stopping ai. They are about how we shape it’s use as a tool.
The difference is that AI training is not illegal and styles aren't copyrightable. So you can already make stuff in the style of some other artist and then sell it.
> This would lead to a win win setup. Artists, developers, writers, lawyers and so on would need compensation for training content - one time or ongoing - leading to higher quality models, job growth and a superior ai product over all.
I have been told information should be free, though.
According to terms and conditions. Ip is not information, it’s work. Pay for it or dont use it. I decide the terms of my own output not you, and certainly not a corporation that resells it and threatens me with unemployment.
> I decide the terms of my own output not you, and certainly not a corporation that resells it and threatens me with unemployment.
Actually - the government decides the terms of your output by passing laws. Your legal right to your content is that which the law allows. If copyright legislation was revoked tomorrow you'd be howling into the void.
What kind of tyranny would allow a handful of corporations to grab my work for free and resell it while making me unemployed? I do want as a citizen to have my work protected, and i equally want that corporations to benefit from it and compensate me if so i wish.
Anti copyright is a bit like communism. What’s the plan? That we all live in one happy commune while the politburo owns everything and we starve in the name of glorious progress? We tried it before and it didnt work.
If my license prohibits use of my work for ai training, or requires that any modified code includes my license or credits, or i lack a license, or my web blog doesnt give you permission to train against my content then you shouldnt use it. Google tried hijacking content with amp and ai is not different from it. If you violate my terms then i want to be able to submit evidence - or suspicion - to a government agency that audits or fines you to oblivion. Ideally you have to pay damages equal to the number of people that you may have sold my content to, in full or partially.
This would lead to a win win setup. Artists, developers, writers, lawyers and so on would need compensation for training content - one time or ongoing - leading to higher quality models, job growth and a superior ai product over all.
Ai is by and large a net positive but needs to be done right.