I think that the point is mainly how much means a copyright issue to train a machine learning model that can recreate the original with a specific percentage of similarity bu chunks (and define what means a chunk for code).
As example if GPL license says that you can use the 30% of code lines for usage to generate a model (I don't know a better lawyer term) I think that everything will be solved.

This is a new area for lawyers to discuss and also for OSI and we cannot do anything for that as it isn't covered and there are difference between countries a lot.

Honestly I think that the issue there is the GitHub behaviour that if also they are allowed to do something with the source the users upload there they are not releasing the list of repositories used or specified projects by license (if they are private or public too). Or just asked for permissions for it as it is splitting the dev community that are their customers after all.
We cannot forget that we don't know if they used also repositories with no licenses that are the facto proprietary and this happens a lot on GitHub.

My personal notes about abandoning GitHub as protest. I have everything on GitHub since 2012 with like 46 repositories on my profile (not talking about the various organizations and forks) and is not easy to migrate to a different platform and change all the reference to those. Also there is the issues of the many forks that aren't on GitHub so is not possible to contribute.
So I think that the only way is forcing GitHub to do something, like specify more clearly what they are doing and how.

FSFE and maybe also FSF can think on creating a campaign to asks to GitHub to do that changes.

Daniele Scasciafratte - OpenSource MultiVersal Guy
daniele.tech - @Mte90Net - GitHub - Italian Linux Society council member - Mozillian
Mozilla Reps, Mozilla TechSpeakers, WordPress Core Contributor, FSFE member,
LibreItalia member, Wikimedia Italia member and LUG Rieti founder.

Il 12/07/21 23:16, marc ha scritto:

If
I ROT13 a Metallica mp3, then there is an algorithmic
transformation and new file is clearly different, but it
is possible to recover the original. In the same way it
could be argued that the copilot model encodes the input
code in its weightings. I suppose there are some losses,