On Monday, 12 July 2021 23:16:22 CEST marc wrote:
Hi, me again
So I am going to respond to multiple comments in one go:
I had a look at Julia Reda's post, and as far as I can make out, she only focuses on the fact that individual snippets are very short - but doesn't make any mention that inserting *lots* snippets algorithmically is *all* that copilot does...
I imagine that the thinking at GitHub here is that if anyone's code is copied verbatim into something else, those people won't be able to assert their copyright based on some kind of "lack of standing", even if a lot of other people's code is also copied into the final work. Given that this kind of defense worked for actual, substantial, alleged infringement of the Linux kernel code by VMware, as opposed to chaotic copy-pasting of code fragments, I can easily see Microsoft's lawyers feeling confident that GitHub can get away with this, especially if combined with wide-eyed "it's an artificial intelligence" nonsense.
(I really wish people would stop referring to the application of supposed artificial intelligence techniques as *an* artificial intelligence, especially since beyond the breathless hype, few of those people are likely to be bothered with any of the broader ethical considerations at stake, particularly when applications of artificial intelligence do become sophisticated enough to merit concern about matters like the autonomy of such systems themselves.)
In a way her position is understandable - I believe it is consistent with that of the pirate party - they are copyright minimalists, and would prefer a world with no copyright, as far as I can tell. But until IP lawyers call themselves TSOALGGM lawyers (that expands to "temporary stewards of a limited government granted monopoly" rather than "intellectual property") I am not sure if her view is representative of the current situation.
One might agree with her assertion that stronger and more severe copyright laws are not necessarily helpful for Free Software, copyleft in particular, given that copyleft is effectively meant to subvert the copyright regime to promote the fair sharing of software. I also have to say that this thread is the first I've heard of this matter, and since I follow the FSF's announcements (and controversies) fairly actively, I wonder which "copyleft scene" she is referring to. Maybe a bunch of people who have shovelled their code onto GitHub because it was the popular thing to do, plus a bunch of people on proprietary "social media" platforms?
Another commenter said that the codebase used to generate the model is just somehow the "input" and not actually *in* model - but I am not sure the distinction is that clear. If I ROT13 a Metallica mp3, then there is an algorithmic transformation and new file is clearly different, but it is possible to recover the original. In the same way it could be argued that the copilot model encodes the input code in its weightings. I suppose there are some losses, but if I were to downsample and ROT13 a Metallica CD (I don't, I have decided not to like their music), I'd still be in trouble if I'd claim it as my own work, right ? And if I XOR it with a Rick Astley mp3, would that suddenly be fair use ?
There are quite a few things in the commentary that other commentators have surely picked apart already, but even skimming the text provides quite a few eyebrow-raising moments. For instance, the revelation that merely reading a book does not infringe copyright might be worth repeating to, say, the music industry, but what copyright is all about is indicated by its name. And traditionally, copying information does not tend to include "copying" it into one's brain via the visual system and other cognitive processes.
However, it is easy to see the top of the slippery slope at this point. What if the software is "an artificial intelligence" (sigh) that is merely being trained. It is then easy to imagine that companies might want to have things both ways (as they do now, but that is another matter): their artificial minion does all the work and isn't infringing anyone's copyright, but the company gets to copyright the output. At the same time, they can plead that it is just a machine and, unlike a human, cannot knowingly plagiarise other people's works.
Another thing that stood out was this: "The output of a machine simply does not qualify for copyright protection – it is in the public domain." Although I recognise that within a specific context, it might be true, it certainly is not unquestionably true beyond that context. A compiler takes source code and produces object code, but that object code is not in the public domain. Having spent the last few months indulging my nostalgia and reading old computing publications, I am reminded of the outrage back in the 1980s when one company produced a compiler for a microcomputer and then claimed that the output was, at least in part, based on their original work:
"Softek compiler payments dispute" https://archive.org/details/popular-computing-weekly-1983-05-26/mode/1up
Even if "computed" output is not subject to copyright, various inputs will be, and as the output is a translation of some input, it may also be, too. In the Free Software movement, people are fairly careful about such precedents for good reasons. I say good luck to anyone wanting to test their legal theories by, let us say, publishing a machine-translated version of one of the Harry Potter books.
Finally for more amusement value: A different conversation points out to me that I should have made my mail more click-baity: "Does copilot mean that microsoft has lost its license to distribute the linux kernel ?" I am still not sure - but maybe this bit of sensationalism makes is clearer what is at stake ?
One might argue that if the tool reproduces code fragments that are big enough to be considered candidates for copyright infringement, since they are source code fragments then the only obligation is to ensure that copyright and licensing information is also provided to the user of the tool. That possibly gets GitHub off the hook, but it then leaves the user of the tool to figure out what the status of the resulting work might be. Maybe it should also be generating a REUSE manifest to help that poor end-user.
Paul