I think that the point is mainly how much means a copyright issue
to train a machine learning model that can recreate the original
with a specific percentage of similarity bu chunks (and define
what means a chunk for code).
As example if GPL license says that you can use the 30% of code
lines for usage to generate a model (I don't know a better lawyer
term) I think that everything will be solved.
This is a new area for lawyers to discuss and also for OSI and we cannot do anything for that as it isn't covered and there are difference between countries a lot.
Honestly I think that the issue there is the GitHub behaviour
that if also they are allowed to do something with the source the
users upload there they are not releasing the list of repositories
used or specified projects by license (if they are private or
public too). Or just asked for permissions for it as it is
splitting the dev community that are their customers after all.
We cannot forget that we don't know if they used also repositories
with no licenses that are the facto proprietary and this happens a
lot on GitHub.
My personal notes about abandoning GitHub as protest. I have
everything on GitHub since 2012 with like 46 repositories on my
profile (not talking about the various organizations and forks)
and is not easy to migrate to a different platform and change all
the reference to those. Also there is the issues of the many forks
that aren't on GitHub so is not possible to contribute.
So I think that the only way is forcing GitHub to do something,
like specify more clearly what they are doing and how.
FSFE and maybe also FSF can think on creating a campaign to asks
to GitHub to do that changes.
If I ROT13 a Metallica mp3, then there is an algorithmic transformation and new file is clearly different, but it is possible to recover the original. In the same way it could be argued that the copilot model encodes the input code in its weightings. I suppose there are some losses,