Hacker Newsnew | past | comments | ask | show | jobs | submit | greensoap's commentslogin

Cost -- it is way cheaper to use IPR and avoid discovery associated with the other factors that happen at trial. Speed, the PTO is generally faster.


It's not really about cost for most people who really like the IPR. It's a way to get a second chance to invalidate a patent and a way to drag out litigation. The cost of an IPR is actually not that much less than the cost of invalidation during a trial (although it saves you on other discovery because you can often stay the trial during your IPR), but it's a second invalidation path you can take at the same time.


The reality is that you get multiple bites at the apple. You challenge at trial and in an IPR. Lose both, challenge with an EPR and appeal the trial. IPRs were initially created to simplify trial and cost, but once the estoppel provisions were determined not to have much teeth, it just became something you did because there was no downside.


There is a fairly vocal contingent of patent people on LinkedIn swearing this is good for the solo guy, the small independent inventor. But yes, it does feel like it will be trolls that are in favor -- maybe some pharma wants this.


Anything "good for trolls" is good for the "small guy" because anything that's not "good for trolls" is good for the "big guy" and anything good for the "big guy" is not good for the "small guy"


Except patent trolls, do not strictly go after big guys. In fact, quite the opposite. They first go after little guys who cannot afford to defend themselves, and - after racking up a series of victories - only then do they go after the big guys. Patent trolls are bad for everyone.


I did not state that "patent trolls" "strictly" go after "big guys"


There was an extension. I don't have link handy, but an extra 15 days were provided.


Generally curious, I don't see anything about hardware. Isn't this is about making a login that doesn't require you to login to MS's cloud. Also, what HW restriction does Microsoft want? Why do they care?


Windows 11 requires TPM 2.0, that's the actual reason a massive number of PCs can't update to it. There's apparently some way you can hack around that and install it. I assumed that's what these videos were about. But from the reddit post it looks like it's talking about both that and the account login issue which I wasn't familiar with.

> including how to install Windows 11 without logging into a Microsoft account and how to install Windows 11 on unsupported hardware.


Yes this is about a local login, it has nothing to do with hardware.


Actually, the court really only said downloading a pirated book to store in your "library" was bad. The opinion is intentionally? ambiguous on whether the decision regarding copies used to train an LLM applies only to scanned books or also to pirated books. The facts found in the case are the training datasets were made from the "library" copies of books that included scans and pirated downloads. And the court said the training copies were fair use. The court also said the scanned library copies were fair use. The court found that the pirated library copies was not fair use. The court did not say for certain whether the pirated training copies were fair use.


A point of clarifications and some questions.

The portion the court said was bad was not Anthropic getting books from pirated sites to train its model. The court opined that training the model was fair use and did not distinguish between getting the books from pirated sites or hard copy scans. The part the court said was bad, which was settled, was Anthropic getting books from a pirate site to store in a general purpose library.

--

  "To summarize the analysis that now follows, the use of the books at issue to train Claude
  and its precursors was exceedingly transformative and was a fair use under Section 107 of the
  Copyright Act. And, the digitization of the books purchased in print form by Anthropic was. 
  also a fair use but not for the same reason as applies to the training copies. Instead, it was a
  fair use because all Anthropic did was replace the print copies it had purchased for its central
  library with more convenient space-saving and searchable digital copies for its central
  library — without adding new copies, creating new works, or redistributing existing copies.
  However, Anthropic had no entitlement to use pirated copies for its central library. Creating a
  permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy."

  "Because the legal issues differ between the *library copies* Anthropic purchased and
  pirated, this order takes them in turn."

--

Questions

As an author do you think it matters where the book was copied from? Presumably, a copyright gives the author the right to control when a text is reproduced and distributed. If the AI company buys a book and scans it, they are reproducing the book without a license, correct? And fair use is the argument that even though they violated the copyright, they are execused. In a pure sense, if the AI company copied (assuming they didn't torrent back the book) from a "pirate source" why is that copy worse then if they copied from a hard book?


> AI company buys a book and scans it, they are reproducing the book without a license, correct

isn't digitizing your own copies as backups and personal use fine? so long as you dont give away the original while keeping the backups. similarly, dont give away the digital copies.


It is, Google Books did it over a decade ago (bought up physical books and scanned them all). There were some rulings about how much of a snippet they were allowed to show end users as fair use, but I'm fairly sure the actual scanning and indexing of the books was always allowed.


> If the AI company buys a book and scans it, they are reproducing the book without a license, correct?

No? I think there are a lot more details that need to be known before answering this question. It matters what they do with it after they scan it.


That is only relevant to whether it is fair use not to whether the copying is an infringement. Fair use is what is called an affirmative defense -- it means that yes what I did was technically a violation but is forgiven. So on technicalities the copying is an infringement but that infringement is "okay" because there is a fair use. A different scenario is if the copyright owner gives you a license to copy the work (like open source licenses). In that scenario the copying was not an infringement because a license exists.


> Fair use is what is called an affirmative defense

Yes

> it means that yes what I did was technically a violation but is forgiven

Not at all. All "affirmative defence" means is that procedural the burden is on me to establish that I was not violating the law. The law isn't "you can't do the thing", rather it is "you can't do the thing unless its like this". There is no violation, there is no forgiveness as there is nothing to forgive, because it was done "like this" and doing it "like this" doesn't violate the law in the first place.


If I have have an app on my phone that lets me point my phone at a page to scan, OCR, and read the page out loud to me, it wouldn't even require fair use, would it?


Anthropic legally purchased the books it used to train its model according to the judge. And the judge said that was fine. Anthropic also downloaded books from a pirate site and the judge said that was bad -- even though the judge also said they didn't use those books for training....


Anthropic literally did exactly this to train its models according to the lawsuit. The lawsuit found that Anthropic didn't even use the pirated books to train its model. So there is that


The lawsuit didn't find anything, Anthropic claimed this as part of the settlement. Companies settle without admission of wrongdoing all the time, to the extent that it can be bargained for.


The judge's ruling from earlier certainly seemed to me to suggest that the training was fair use.

Obviously, that's not part of the current settlement. I'm no expert on this, so I don't know the extent to which the earlier ruling applies.


If I'm reading this right yes the training was fair use, but I was responding (unclearly) to the claim that the pirated books weren't used to train commercially released LLMs. The judge complained that it wasn't clear what was actually used, from the June order https://fingfx.thomsonreuters.com/gfx/legaldocs/jnvwbgqlzpw/... [pdf]:

> Notably, in its motion, Anthropic argues that pirating initial copies of Authors’ books and millions of other books was justified because all those copies were at least reasonably necessary for training LLMs — and yet Anthropic has resisted putting into the record what copies or even sets of copies were in fact used for training LLMs.

> We know that Anthropic has more information about what it in fact copied for training LLMs (or not). Anthropic earlier produced a spreadsheet that showed the composition of various data mixes used for training various LLMs — yet it clawed back that spreadsheet in April. A discovery dispute regarding that spreadsheet remains pending.


Thanks for this info. I was looking for which pirated books were used for which model.

Ethically speaking, if Anthropic (a) did later purchase every book it pirated or (b) compensated every author whose book was pirated, would it absolve an illegally trained model of its "sins"?

To me, the taint still remains. Which is a shame, because it's considered the best coding model so far.


> Ethically speaking, if Anthropic (a) did later purchase every book it pirated or (b) compensated every author whose book was pirated, would it absolve an illegally trained model of its "sins"?

No, it part because it removes agency from the authors/rightsholders. Maybe they don't want to sell Anthropic their books, maybe they want royalties, etc.


Can authors even claim such rights though? I doubt think they even had such agency to begin with


If they're the rightsholders, they can do whatever they want with their IP, including changing licensing terms, adding contractual obligations forbidding certain types of use, forbidding sale, etc.


I feel like that would bounce hard off first sale doctrine. But what do I know.


You still have to adhere to license and copyright terms after first sale.

You can't sell a Bluray disk to a movie theater and give them the right to charge an audience to watch it in the theater later.

If rightsholders are worried about certain uses of their IP being found to be fair use, they might then change the terms of release contractually to stop or at least partially prevent training.


They stated it in court in their papers for summary judgment on the issue of fair use. My gosh! To pretend like you know what you're talking about but missing that detail?


I'm "team Anthropic" if we're stack ranking the major American labs pumping out SOTA models by ethics or whatever, but there is no universe in which a company like them operating in this competitive environment didn't pirate the books.


"ethics or whatever" seem like a good tagline for people rooting for an AI-company when it's being sued by authors.


Makes sense why Effective Altruism is so popular. Commit crime, make billions, give back when dead, live guilt free?


Except for Google at least.


In a prior ruling, the court stated that Anthropic didn't train on the books subject to this settlement. The record is that Anthropic scanned physical books and used those for training. The pirated books were being held in a general purpose library and were not, according to the record, used in training.


So how did they profit off the pirated books?


According to the judge, they didn't. The judge said they stored those books in a general purpose library for future use just in case they decided to use them later. It appears the judge took much issue with the downloading of "pirated content." And Anthropic decided to settle rather than let it all play out more.


But how the settlement cost was then defined if nobody read those books and there was no financial lost...


That is something which is extremely difficult to prove from either side.

It is 500,000 books in total so did they really scan all those books instead of using the pirated versions? Even when they did not have much money in the early phases of the model race?


The 500,000 number is the number of books that are part of the settlement. If they downloaded all of Libgen and the other sources it was more like >7Million. But it is a lot of work to determine which books can legitimately be part of the lawsuit. For example, if any of the books in the download weren't copyright (think self published) or not protected under US copyright law (maybe a book only published in Venezula) or it isn't clear who own the copyright then that copyright owner cannot be part of the class. So it seems like the 500,000 number is basically the smaller number of books for which the lawyers for the plaintiff felt they could most easily prove standing.


There is a difference between localized inflammation that is bringing the source of healing to injury and systemic inflammation


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: