But it actually is explicitly copying the text. That's how it works. The trainin...

Kiro · on June 29, 2021

No, that's not how it works.

"[...] the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set"

https://copilot.github.com/#faqs

not2b · on June 29, 2021

If that's the case (only 0.1%), the developers must have done something that differs from other openai experiments that suggest code sequences that I recall seeing, where significant chunks of code from Stack Overflow or similar sites were appearing in answers.

ipsum2 · on June 29, 2021

So you're gambling on whether that the code that was generated or copied.

lostmsu · on June 30, 2021

No you aren't. Courts will consider it fair use.

mytherin · on June 30, 2021

How are you going to prove it was the AI that generated the GPL licensed function ad verbatim from another project, rather than you just opening that project and copying the function yourself?

lostmsu · on July 1, 2021

I will not. Courts will simply consider a single function not to be substantive enough piece of work to constitute unfair use.

visarga · on June 29, 2021

use a bloom filter to skip/regenerate that 0.1%

zarzavat · on June 29, 2021

Synthesising material from various sources isn't copyright infringement, that's called writing.

It's only infringement if the portion copied is significant either absolutely or relatively. A line here or there of the millions in the Linux kernel is okay. A couple of lines of a haiku is not. Copyright is not leprosy.

Iv · on June 29, 2021

Google Books actually displays full pages of copyrighted works Google did not license. It was considered legal.

[1] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

f38zf5vdt · on June 29, 2021

We all don't have Google resources. What if someone comes after us individually because some model-generated code is near identical to code in a GPL codebase? Where is the liability here?

edit: from https://copilot.github.com/

> What is my responsibility when I accept GitHub Copilot suggestions?

> You are responsible for the content you create with the assistance of GitHub Copilot. We recommend that you carefully test, review, and vet the code, as you would with any code you write yourself.

Well, that solves that question.

Iv · on June 30, 2021

We are all vulnerable to predatory lawyer trolls, whether we do things correctly or not. If you are accused of reusing a GPL code, then you ask clarification on which and you rewrite. It is likely to be just a snippet. I doubt Copilot would write a whole lib by copying it from another project.

And yes, of course github is not going to take responsibility for things you do with their tools.

devetec · on June 30, 2021

If you learn programming from Stack Overflow and Github, and then repeat something that you learned over your time at reading, that's not just copying text. That's having learned the text. You could say the human brain is mashing together several different code examples, taking some text from each.

dekhn · on June 29, 2021

hmm, so let's think this through.

Wouldn't that imply that a person who learned to code on GPLv2 sources wrote writes some more code in that style (including "long strings of code", some of which are clearly not unique to GPL) is writing code that is "born GPLv2"?

I don't think it currently works that way.