Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have many GPL projects (e.g. https://github.com/rochus-keller/Oberon, https://github.com/rochus-keller/Luon, https://github.com/rochus-keller/Micron) and spend a significant amount of time in them. GPL has always explicitly permitted commercial use; that's a feature, not a bug, dating back to Stallman's original vision. Any person or company can use my code (or Kefir code) under the terms of the GPL, as I use code given away by companies under GPL or even more liberal licences for free. That's the deal. GPL is a license explicitly designed to maximize use, so it doesn't make sense to object to a specific form of use. The claim that AI companies are somehow violating GPL by training on GPL code is legally baseless (I studied law here in Switzerland and had lectures about international IP law); also the FSF itself has not claimed otherwise; even if it were prohibited, it would be a copyright enforcement problem, and not a reason to stop publishing. I don't know Kefir, but it looks like a great (even optimizing) compiler. So it's really a pitty that its development is no longer open source.
 help



The GPL, unlike the BSD and such, intends to prevent the closing of distributed derivative works. LLMs trained on GPL code can produce derivative works without any enforcement mechanism.

You may be fine with that, but the GPL is not a public domain license, and LLM training treats all things as if they were public domain.


> LLMs trained on GPL code can produce derivative works

This confuses two completely separate things. GPL governs distribution of derivative works. An LLM trained on GPL code does not distribute that code. The model weights are not a copy, a derivative, or a distribution of the training data in any legally recognizable sense; "influenced by" is not "derived from". The enforcement argument is a non sequitur; the GPL has never had a technical enforcement mechanism; it's always been legally enforced after the fact by copyright holders who discover violations. So if the LLM would indeed produce output sufficiently similar to my code and someone would publish it in violation of GPL, I have the same legal means to enforce my rights as if the code was copied by a human.


> An LLM trained on GPL code does not distribute that code.

You can't simply make that assertion. You'll have to prove that LLMs do not actually contain encoded copies of copyrighted code and that they are incapable of reproducing such code verbatim.

There is no evidence for such a claim, and so your entire argument is completely baseless.


> You'll have to prove that LLMs do not actually contain encoded copies

In law, the presumption is that an act is lawful unless proven otherwise. The burden lies on whoever claims a violation occurred. I already went into the case of sufficiently similar reproduction in my previous response.


I mean… it's been common knowledge for a while that they do in fact contain the original data.

https://www.reddit.com/r/programming/comments/oc9qj1/copilot...

You can disagree all you want, but there's ample evidence of this.


> GPL is a license explicitly designed to maximize use

I feel this is a misrepresentation. GPL rather seems designed to maximize source availability for users.

But mandatory public source availability does make selling software products more difficult ("why would anyone pay if they can just use the source"), which is why most commercial software products still sell and ship binaries when they can.


> designed to maximize source availability

Right. It depends on what you mean by "use"; GPL maximizes use in the sense that it prevents anyone from taking the code proprietary and thereby restricting future users' access. But it doesn't touch my actual point, which is that GPL explicitly permits commercial use, broad distribution, and also LLM training (none of which are restricted by the license). The source availability requirement is the condition, not a restriction on who can use the code.

> why would anyone pay if they can just use the source

Red Hat, Qt, and countless others have built commercial businesses on GPL code. So apparently there is a business and people willing to pay even if the source code is available. But that was not my point anyway.


I can see your point, and I do agree that people often cry about "but copyright" in LLM contexts when simply not appropriate.

But I can still understand the Kefir author; if you previously defaulted to GPL (over MIT/BSD) mainly because you wanted to foster an open-source software development culture, then the emergence of LLMs might well be a turning point were publishing your project makes no more sense to you; thanks to LLMs, publishing your open-source project might do more for commercial closed source actors (via better trained LLMs used by the developers they employ) than for open-source developers (or open source culture overall).

Instead of potentially creating a valuable GPL project that pushes other users towards GPL/open source, you might end up making your project a sort of "commodity" easily available to all closed-source developers for a moderate cost in tokens...


It's his code and of course he can decide what to do with it. I just think it's a shame when people make hasty, ill-considered decisions based on misunderstandings and false assumptions. Copyright law is (unfortunately) much more complicated than most people realize. There's also an irony worth noting: many of the same developers who expect free access to compilers, libraries, and frameworks built by companies and use them without a second thought, are the ones loudest about 'exploitation' when the flow goes the other direction. Open source is a two-way street. Personally, I think it's great when people (both individuals and companies) use my software and I've been able to make a contribution to society through it.

Do you think that the author is deciding based on misunderstanding/false assumptions?

He explicitly states that the AI training concerns are not about legal GPL violations but about going against his licensing intentions (and those seem very much in line with the "copyleft spirit" from what I can tell).

My take is that the LLM emergence "threatens" the whole copyleft framework in a way similar to cloud services in the past (which led to the AGPL): closed source development can extract a lot of value from copyleft projects without contributing back in any way (to neither upstream nor their own users).


Yes, I think the reaction is disproportionate. The philosophical concern is understandable, but withdrawing the project doesn't address it; it just removes something valuable from people who aren't responsible for the problem he perceives.

And companies are ultimately owned by people, including ordinary savers whose pension funds depend on them, and they employ people, so they contribute to society.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: