That was the accusation, and it was misplaced here. So we agree, this is a smear...

palata · on Aug 9, 2023

> The idea that counting adverbs is steal their work to the point they won't want to publish anymore is clearly FUD.

I did not mean that, I am genuinely not sure if you rephrased my point to make it sound wrong or if you missed it.

My point was that, IMO, it does not matter to the other whether counting adverbs is stealing their work or not. Probably if you counted them manually they would be fine (and most likely they were fine before generative AI).

What matters to them is that generative AI is trained from their copyrighted material, and they fear it (I would, too).

The day people stop reading my blog because they can just ask ChatGPT and will get something generated (partly) from my material without any kind of attribution, I can promise you I will stop my blog.

ineedasername · on Aug 9, 2023

This project was not generative AI. Comments are saying this project, which is not at all similar to generative ai, seemed to be okay. But you keep replying to say essentially “but if it was generative ai then authors have a legitimate reason to be angry”.

There is no need to shoehorn that debate into this particular situation, and I see no merit in defending authors that had a knee jerk reaction to this project on the grounds that they have reasonable fears about other types of projects.

palata · on Aug 9, 2023

I think it is not completely off topic. Here is how I see it:

Engineers tend to globally think that LLMs are not really a problem for copyright holders. At least those who develop LLMs pretty clearly don't give a damn. And on top of that, it is in their interest to not be constrained by copyrights.

If this is my feeling (that engineers globally don't care about copyright holders), then it seems reasonable to me that non-engineers could feel the same. That sounds fair, doesn't it?

So those people start speaking up when they see a situation where they feel like "it is happening". And because they don't really know the technology, it is hard for them to know if this particular case is a problem or not. And they can't really trust engineers to tell them, because engineers built LLMs in the first place, and really it does not seem like they care about copyright holders.

Finally, engineers see this reaction from authors, and instead of trying to understand where they come from, they dismiss their opinion. Which probably will reinforce the feeling that engineers don't remotely understand the concerns of those people, and keep building their AI-powered laundering machines. Again, engineers working on those technologies in big companies have absolutely no interest in even considering that it is a problem. Because they get a big salary to help their big company get more profitable, even if it kills many jobs and is a net loss for society (because they benefit from that).

ineedasername · on Aug 10, 2023

To rephrase in my own understanding of what you wrote:

1) Some engineers (or more broadly, software developers) do not respect copyright

2) Therefore you reasonably are skeptical of projects related to material under copyright.

3) It is not always obvious if a project is respectful of copyright.

Now, applying these #1,#2,#3 you believe they justify the outrage for this particular project.

I disagree, because outrage combined with a lack of understanding (#3) is pretty much my definition of a knee-jerk reaction and vastly counterproductive to the interests of copyright holders because it will make the dismissiveness you predict a self-fulfilling prophecy.

palata · on Aug 10, 2023

> you believe they justify the outrage for this particular project.

No, I believe it explains it.

> it will make the dismissiveness you predict a self-fulfilling prophecy.

That's the thing: both parties need to listen to each other. The problem here is not this particular project, but the fact that we are not addressing the bigger concern which is LLMs.

IMHO, it is completely useless to try to solve this particular case, because it will happen over and over again. We need to address the LLM issue.

TimPC · on Aug 9, 2023

If you want to be a pitchfork mob against generative AI at least understand whether AI is generative or not? Seems like a reasonably low bar. This was non-generative AI, it didn't produce content it output metrics and labelled some existing content.

palata · on Aug 9, 2023

What makes you think that I don't understand whether AI is generative or not? What I said was that for artists who are complaining about their copyright being abused, it does not matter. 10 years ago they were not complaining, because AIs looking like ChatGPT (to users who see it as a black box) did not exist (or were not remotely as powerful).

And I understand that. It is not their job to learn how the black box works. What they see is that "machine learning models" (which they probably call "AI" now), which are complete black boxes to them (and that's justified: engineers who train them also don't know exactly what they do, but rather test their model on some dataset and judge it from there). And those black boxes are being trained from their copyrighted work and have the potential to generate a ton of money which they will never see.

You can go and say "you guys should learn how the technology works instead of complaining", but let's be honest: probably you are not an expert in AI yourself, and anyway why would the artists have to care? It is a totally legit question that they have: "Why can engineers take my copyrighted work, run it through an algorithm that does stuff no algorithm has done in history at a scale never seen before, make money out of it, and not even consider that maybe they are abusing my IP?".

Before dismissing the artists, you should try to understand their point of view.

aaaaarrrrrfffff · on Aug 9, 2023

I would disagree. Just because you don't quite understand something, doesn't mean your concerns are not worth consideration - consider the recent zoom TOS issue. I doubt that many of us have a deep understanding of how that data's being used, or the internal guidelines that zoom follows for its data use, and most people aren't lawyers specializing in IP law to know exactly how the law would treat zoom if they were to accidentally (or "accidentally") leak IP. We just see that they are putting in a clause in their TOS to allow themselves to do so, remember our own heuristics of how LLM have behaved in the past, and understandably start raising questions. For all we know, zoom's AI might be something constrained to a framework which doesn't allow for such data leaks to occur, or it's generative capabilities might be constrained in some other way. They're just demanding legal permission to do so, but that still rubs a lot of us the wrong way. Our concerns are still justified, even if Zoom never actually touches AI. Artists lack as concrete heuristics as the technical crowd. But they still have concerns that need addressing, and those concerns about the effects of AI still should be considered and respected. If the details of the situation don't match their concerns, care should be taken to explain how they don't match to the people in question, in a way that isn't looking down on them (admittedly, trying to be the calm voice is often a waste of time on the internet) That said, if you were to make an informational video which succintly summarizes the technical details that are relevant to artists, it might become sufficiently popular to influence debate.

aaaaarrrrrfffff · on Aug 9, 2023

(to clarify, this is a response to skjoldr's comment)

skjoldr · on Aug 9, 2023

> It is not their job to learn how the black box works

If you have not learned the basics of how something works, you have no right for your opinion on it to be considered valid. Period.

Invalid opinions do harm to democracy and endanger our way of life.

palata · on Aug 9, 2023

> you have no right for your opinion on it to be considered valid. Period.

That is so wrong it is actually dangerous. Do I need to understand how a nuclear bomb works for my opinion on it to be considered valid? Obviously not. I only need to understand the consequences of it. It does not matter at all how it works, if I am against the fact that it will kill a whole lot of people.

> Invalid opinions do harm to democracy and endanger our way of life.

And engineers have done much, much more to endanger most living animals (including humans) than authors and artists: technology is the reason for the mass extinction we are currently living, and the problems that are coming with climate change. Maybe it's important to start thinking about the consequences of what you do, not only the technicalities of how you do it. And maybe it's high time you start listening to people who are able to think about the consequences of what you do (maybe they understand that better than you do, ever thought of that?), even if they don't know how to do it.

jsnell · on Aug 9, 2023

You can of course have any opinion you want. But this is not just about the authors having an opinion. It's about them starting a harassment campaign based on just faulty facts and making no attempt at verifying them.

If we work from the nuclear bomb analogy, you certainly don't need to be a nuclear physicist to protest nuclear bombs. You just need to have some a reasonably correct high level understanding of the impact of a nuclear bomb. But that's not what is happening here. This is more like storming the Belgian embassy to stop Belgium from using their nuclear arsenal to trigger a chain reaction in the atmosphere: totally detached from reality in every aspect.

As far as I can tell from your messages on this, you think that the harassment was entirely justified. Is that correct?

palata · on Aug 9, 2023

> totally detached from reality in every aspect.

I don't think it is totally detached from reality. I believe that engineers are generally pretty bad at realizing the impact technology will have on society. There are many concerns with generative AI in general: it can potentially "break the Internet" (by finishing breaking search engines which already struggle with SEO), or maybe democracy, who knows? Copyright is one such problem.

> you think that the harassment was entirely justified. Is that correct?

I honestly don't know how far it went. What I saw in the article is a few authors who wrote online that they wanted their book removed from that software. Not sure if it is closer to harassment or to lobbying.

What I see, however, is many comments of engineers who don't see the problem with copyright and who don't seem to understand why non-engineers may be against this technology, or why one would even think about forbidding a technology ("but technology is neutral"). My point is just that those engineers should maybe take a step back and try to reflect on that "technology is neutral" belief.

skjoldr · on Aug 19, 2023

You cannot know the consequences of something if you do not know how it works. Case in point: nuclear reactors. If you do not know how they work, what are their potential dangers, and how they are mitigated by smart design, you do not have a moral right to protest against them. Simple as. Understanding the risks and consequences equals understanding the system in question. Always. This also applies to nuclear weapons, if you do not understand MAD and how they keep other powers in check, and you never had a true threat brief that would explain what exactly nukes are a deterrent against, you just aren't entitled to an opinion on them. Especially one as simple as "it can kill people so I don't want them". This is an invalid opinion, sorry.

rockemsockem · on Aug 9, 2023

If you literally have no idea what a nuclear bomb does, i.e. don't know that it explodes, releases massive amounts of heat, or can kill many tens of thousands of people at once, then no your opinion should NOT be considered valid.

Understanding the consequences of something is a PART of how it works. Since you understand that it can kill a whole lot of people then I'd say you have passed the incredibly low bar.

In this case most of the authors do not understand the consequences of the tool, they think it will generate convincing sound text that sounds like them or that it is serving pirated copies of their books (sourcing that from the original Twitter thread that I unfortunately read a lot of).

This doesn't seem like the thread to debate whether technology is a good thing, but I can't help but call this assertion ridiculous. Technology is responsible for almost every single good thing in the world today.

palata · on Aug 9, 2023

> In this case most of the authors do not understand the consequences of the tool

Because you do? That's my point: engineers believe that because they have some understanding of how machine learning works (and in my experience, usually it is very limited...), they can conclude that they understand the consequences of it. Simple example: the Facebook "like" function, that was supposed to be positive ("oh nice, I got likes"), and actually increases addiction and is mostly negative ("oh no, why did I not get likes?"). Clearly those who implemented the first likes had not realized what consequences they would have.

> Technology is responsible for almost every single good thing in the world today.

If you have a very limited view of the world, I guess it could be. I like trees, flowers, bees, birds, mountains, snow. Can you tell me which ones come from technology? Let me help you: most of them are threatened to die in this century because of technology. For most living species, every single improvement technology is bad news. To the point where it is now globally becoming bad news for humans, because it's quite likely that we will get into global instability, wars, and famines in the next few decades because of technology. Think about it when we start having billions climate refugees, and think about how you were dismissing opinions contradicting your beliefs based on the fact that you understand some implementation detail.

But let's even ignore the fact that the next few decades will most likely get pretty bad for us. It is true that right now, we live longer, we have more food (and obesity problems), and we can cure many diseases that we could not in the past. Does that mean we are happier? Happier than whom? Vikings? Ancient romans? Ancient greeks? That question seems closer to history and philosophy... why does your opinion count then? Are you historian/philosopher?

TimPC · on Aug 9, 2023

Because fair use allows transformation and the output of their algorithm looks nothing like the input of the copyrighted work? For generative models its more complicated because generative models can actually reproduce large sections of a copyrighted work so transformation is a bit less clear.

palata · on Aug 9, 2023

> Because fair use allows transformation and the output of their algorithm looks nothing like the input of the copyrighted work?

I feel like you miss the point of a law. You seem to read the law, and say "well, the law says X, new technology Y is compatible with it, so that's legal, everyone is happy". But that is wrong. The law reflects the society we want. Do we want a society that completely kills creative work because Big Tech found a loophole to launder their IP? I guess we all agree that we don't. It is not clear if LLM is that loophole, I agree. But you seriously have to take a step back and think about that. What if it does? Then we may have to redefine the meaning of "fair use".

Maybe this particular software was not a danger for those authors. But they don't know that. And given that most engineers talking about LLMs don't seem to remotely understand how one could be worried about it, I understand that they start speaking up wherever they can't. Because clearly it does not seem like those who build those systems give a damn about copyright holders.

TimPC · on Aug 10, 2023

I think the actual issue is nuanced and complicated. I think it's fairly clear the tool in question that was non-generative AI is the kind of thing we want to allow under fair use. Whether we want to allow generative AI is more complex, I'd lean towards requiring a license because of non-deterministic duplication. Fair use is an important part of copyright law and we should be very cautious about eroding it. For example, I like Green Day's transformation of the scream icon and think it was substantially different enough that it should be allowed. The courts agreed under current transformation laws but if we weaken protection against transformation we likely reverse the ruling of cases like that as well.