AI assistance when contributing to the Linux kernel

475 points

1/21/1970

a day ago

by hmokiguess

Comments

qsort

Basically the rules are that you can use AI, but you take full responsibility for your commits and code must satisfy the license.

That's... refreshingly normal? Surely something most people acting in good faith can get behind.

a day ago

pibaker

I agree this is very sane and boring. What is insane is that they have to state this in the first place.

I am not against AI coding in general. But there are too many people "contributing" AI generated code to open source projects even when they can't understand what's going on in their code just so they can say in their resumes that they contributed to a big open source project once. And when the maintainer call them out they just blame it on the AI coding tools they are using as if they are not opening PRs under their own names. I can't blame any open source maintainer for being at least a little sceptical when it comes to AI generated contributions.

19 hours ago

theptip

I think them stating this very simple policy should also be read as them explicitly not making a more restrictive policy, as some kernel maintainers were proposing.

15 hours ago

Applejinx

From everything I'm seeing in the industry (I'm basically a noncoder choosing to not use AI in the stuff that I make, and privy to the private work experience of coders and creators also in that field because of human social contacts), I feel like I can shed a bit of light.

It looks to me like a more restrictive policy will be flat-out impossible.

Even people I trust are going along with this stuff, akin to CAD replacing drafting. Code is logic as language, and starting with web code and rapidly metastasizing to C++ (due to complexity and the sheer size of the extant codebase, good and bad) the AI has turned slop-coding to a 'solved problem'. If you don't mean to do the best possible thing or a new thing there is no excuse for existing as a coder in the world of AI.

If you do expect to do a new thing or a best thing, in theory you're required to put out the novel information as AI cannot reach it until you've entered it into the corpus of existing code the AI's built on. However, if you're simply recombining existing aspects of the code language in a novel way, that might be more reachable… that's probably where 'AI escape velocity' will come from should it occur.

In practice, everybody I know is relegating the busywork of coding to AI. I don't feel social pressure to do the same but I'm not a coder. I'm something else that produces MIT-licensed codebases for accomplishing things that aren't represented in code AS code, rather it's for accomplishing things that are specific and experiential. I write code to make specific noises I'm not hearing elsewhere, and not hearing out of the mainstream of 'sound-making code artifacts'.

Therefore, it's impractical for Linux to take any position forbidding AI-assisted code. People will just lie and claim they did it. Is primitive tab-complete also AI? Where's the line? What about when coding tools uniformly begin to tab-complete with extensive reasoning and code prototyping? I already see this in the JetBrains Rider editor I use for Godot hacking, even though I've turned off everything I can related to AI. It'll still try to tab-complete patterns it thinks it recognizes, rarely with what I intend.

And so the choice is to enforce responsibility. I think this is appropriate because that's where the choices will matter. Additions and alterations will be the responsibility of specific human people, which won't handle everything negative that's happening but will allow for some pressures and expectations that are useful.

I don't think you can be a collaborative software project right now and not deal with this in some way. I get out of it because I'm read-only: I'm writing stuff on a codebase that lives on an antique laptop without internet access that couldn't run AI if it tried. Very likely the only web browsers it can run are similarly unable to handle 2026 web pages, though I've not checked in years. You've only got my word for that, though, and your estimation of my veracity based on how plausible it seems (I code publically on livestreams, and am not at all an impressive coder when I do that). Linux can't do what I do, so it's going to do what Linux does, and this seems the best option.

9 hours ago

alfiedotwtf

You can refuse to use AI personally, but why would you not help yourself when you can?

… my dad is 86 and only after I signed him up to Claude could he write Arduino code without a phone call to me after 5 minutes of trying himself. So now, he’s spending 4+ hours at a time focused writing code and building circuits of things he only dreamt about creating for decades.

Unless you’re doing something for the personal love of the craft and sharpening your tools, use every advantage you can get in order to do the job.

But… as above, if you’re doing it for the love of it, sure - hand crafted code does taste better and you know all the ingredients are organic

7 hours ago

vips7L

Or just let people do the job the way they want.

6 hours ago

matheusmoreira

On the other hand, it seriously sucks to spend time learning a big codebase and modifying it with care, only to not be given the time of day when you send the patches to the maintainers. Sometimes the reward for this human labor isn't a sincere peer review of the work and a productive back-and-forth to iron out issues before merging, it's to watch one's work languish unnoticed for a long time only for the maintainer to show up after the fact and write his own fix or implementation while giving you a shout out in the commit message if you're lucky.

Can't really blame people for reducing their level of effort. It's very easy to put in a lot of effort and end up with absolutely nothing to show for it. Before AI came along, my realization was that begging the maintainers to implement the features I wanted was the right move. They have all the context and can do it better than us in a fraction of the time it'd take us to do it. Actually cloning someone else's repository and working on it should only be attempted if one is willing to literally fork it and own the project should things go south. Now that we have AI, it's actually possible to easily understand and modify complex codebases, and I simply cannot find the will to blame people for using it to the fullest extent. Getting the AI to maintain the fork is really easy too.

5 hours ago

jlarocco

> I agree this is very sane and boring. What is insane is that they have to state this in the first place.

I don't think it's insane. It seems reasonable that people could disagree about how much attribution and disclosure there should be about AI assistance, or if it's even allowed, etc.

Every document in that `process` directory explains stuff that could be obvious to some people but not others.

3 hours ago

cat_plus_plus

That's a dim view, people also contribute to make projects work for their own needs with hopes to share fixes with others. Like if I make a fix to vLLM to make a model load on particular hardware, I can verify functionality (LLM no longer strays off topic) and local plausibility (global scales are being applied to attention layers), but I can't pretend to understand full math of the overall process and will never have enough time to do so. So, I can be upfront about AI assist and then maintainer can choose to double check, or else if they don't have time, I guess I can just post a PR link on model's huggingface page and tell others with same hardware they can try to cherrypick it.

What's missed is that neither contributors nor maintainers are usually paid for their effort and nobody has standing to demand that they do anything they are not doing already. Don't like a messy vibe coded PR but need functionality? Then clean it up yourself and send improved version for review. Or let it be unmerged. But don't assign work to others you don't employ.

On the other hand, companies like NVIDIA should be publicly taken to task for changing their mind about instruction set for every new GPU and then not supporting them properly in popular inference engines, they certainly have enough money to hire people who will learn vLLM inside out and ensure high quality patches.

4 hours ago

lrvick

It cannot be understated how religiously opposed many in the Linux community are to even a single AI assisted commit landing in the kernel no matter how well reviewed.

Plenty see Torvalds as a traitor for this policy and will never contribute again if any clearly labeled AI generated code is actually allowed to merge.

15 hours ago

cinntaile

Some people are just against change, that's nothing new. If Linus was like them, he would never have started linux in the first place.

14 hours ago

sdevonoes

Not every change is good, and sometimes we realise too late

12 hours ago

cinntaile

What is it that worries you about the change that is happening?

12 hours ago

drzaiusx11

For me it's always the fear of AI regurgitating something legally problematic directly from its training set: unintentionally adding copyright and licensing issues from those even with no intentions of doing so.

Obviously these issues existed before AI, but they required active deception before. Regurgitating others people's code just becomes the norm now.

2 hours ago

cjfd

All kinds of worries are possible. (1) It turns out that all this AI generated stuff is full of bugs and we go back to traditional software development, creating a giant disinvestment and economic downturn. (2) sofware quality going way down. we cannot produce reliable programs anymore. (3) massive energy use makes it impossible to use sustainable energy sources and we wreck the environment every more than we are currently doing. (4) AIs are in the hands of a few big companies that abuse their power. (5) AI becomes smarter than humans and decides that humans are outdated and kills all of us.

It obviously depends on how powerful AI is going to become. These scenarios are mutually exclusive because some assume that AI is actually not very powerful and some assume that it is very powerful. I think one of these things happening is not at all unlikely.

7 hours ago

rowyourboat

1 and 2 are really only an issue if you vibe code. There's no reason to expect properly reviewed AI assisted code to be any worse than human written code. In fact, in my experience, using LLMs to do a code review is a great asset - of used in addition to human review

2 hours ago

intended

People have measurably lower levels of ownership and understanding of AI generated code. The people using GenAI reap a major time and cognitive effort savings, but the task of verification is shifted to the maintainer.

In essence, we get the output without the matching mental structures being developed in humans.

This is great if you have nothing left to learn, its not that great if you are a newbie, or have low confidence in your skill.

> LLM users also struggled to accurately quote their own work. While LLMs offer immediate convenience, our findings highlight potential cognitive costs. Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels.

> https://arxiv.org/abs/2506.08872

> https://www.media.mit.edu/publications/your-brain-on-chatgpt...

5 hours ago

xxpor

While I agree with this intuitively, I also just can't get past the argument that people said the same thing when we switched from everyone using ASM to C/Fortran etc.

4 hours ago

rowyourboat

> The people using GenAI reap a major time and cognitive effort savings, but the task of verification is shifted to the maintainer.

The people using GenAI should be the ones doing the verification. The maintainer's job should not meaningfully change (other than the maintainer using AI to review on incoming code, of course).

Why does everyone who hears "AI code" automatically think "vibe-coded"?

2 hours ago

goatlover

Are they against change in general, or certain kinds of change? Remember when social media was seen as near universal good kind of progress? Not so much now.

13 hours ago

cinntaile

Social media has never been seen as a universal positive force? It's the same with AI. It has good and bad aspects as does any technology that has an impact on this scale, AI will arguably have a much bigger impact imo.

People are generally against change that forces them to change the way they used to do things. I'm sure most will have their reasons why they are against this particular change, but I don't think it will affect anything. The genie is out of the bottle, AI is here to stay. You either adapt or you will slowly wither away.

13 hours ago

dwedge

It reminds me of something I read on mastodon: "genie doesn't go back in the bottle say AI promoters while the industry spends a trillion dollars a year to try to keep the genie out of the bottle"

11 hours ago

cinntaile

Do you think the genie will go back in the bottle and why?

11 hours ago

matheusmoreira

It's certainly possible. All that is required is for AIs to become more expensive than humans. Developing projects on a $100 Claude Code subscription is a lot of fun. I bet people would simply go back to hiring human developers if that subscription cost $10,000 instead.

5 hours ago

gnz11

Adapting implies you are still a part of the environment though. AI is on a trajectory to replace you and take you out of the environment.

11 hours ago

bdangubic

AI is on a trajectory to replace people who do not effectively use AI with people that do

10 hours ago

gnz11

That is the bait and switch. The end goal is that you are out of the equation. Your perceived effectiveness at using AI as an exchange of labor diminishes over time to the point that you become irrelevant.

9 hours ago

brabel

Who has that end goal?? Who is going to direct the AI if only the CEO is left in the organization? The CEO will never actually do it , and will always need someone who can and will do it. I just can’t see a grand plan to take humans out of the equation entirely.

7 hours ago

bdangubic

that most definitely is a plan, make no mistake about it. but as mike tyson famously said, “every has a plan until they get punched in a mouth” :)

6 hours ago

bdangubic

this is certainly a possibility but human beings and societies as a whole adapt

7 hours ago

LtWorf

> Social media has never been seen as a universal positive force?

You missed the whole arab spring thing?

10 hours ago

cinntaile

If you selectively read one sentence of my comment, you risk missing the forest for the trees. I don't have any particular knowledge on the arab spring so I won't comment on that but I quite clearly said that technology has good and bad aspects to it.

10 hours ago

wafflemaker

Is it meant as sarcasm?

10 hours ago

contraposit

This is like blaming a knife as being a killer weapon. Social media is inherently good if owners of the platforms allow for good interactions to take place. But given the mismatch between incentives alignment, we don't have nice things.

13 hours ago

dwedge

Social media is good if owners allow for good is an example of the logical fallacy "begging the question"

11 hours ago

contraposit

Also blaming the tool for the crime is some sort of fallacy. I don't know name you can ask AI.

6 hours ago

Luker88

Just remember that "reviewed" is not enough to not be considered public domain.

It needs to be modified by a human. No amount of prompting counts, and you can only copyright the modified parts.

Any license on "100% vibecoded" projects can be safely ignored.

I expect litigations in a few years where people argue about how much they can steal and relicense "since it was vibecoded anyway".

13 hours ago

shakna

For those who might wonder how accurate this is, there is advice from the Federal Register to this effect. [0] Its quite comprehensive, and covers pretty much every question that might be asked about "What about...?"

> In these cases, copyright will only protect the human-authored aspects of the work, which are “independent of” and do “not affect” the copyright status of the AI-generated material itself.

[0] https://www.federalregister.gov/documents/2023/03/16/2023-05...

12 hours ago

martin-t

I cannot take seriously any politician or layer using the words "artificial intelligence", especially to models from 2023. These people have never used LLMs to write code. They'd know even current models need constant babysitting or they produce unmaintainable mess, calling anything from 2023 AI is a joke. As the AI proponents keep saying, you have to try the latest model, so anything 2 years old is irrelevant.

There's really 2 ways to argue this:

- Either AI exists and then it's something new and the laws protecting human creativity and work clearly could not have taken it into account and need to be updated.

- Or AI doesn't exist, LLMs are nothing more than lossily compressed models violating the licenses of the training data, their probabilistically decompressed output is violating the licenses as well and the LLM companies and anyone using them will be punished.

12 hours ago

shakna

If monkeys can't hold copyright, which is an actual case discussed above, then no, an LLM probably can't either. "Human" is required.

10 hours ago

martin-t

Yeah, an LLM, being a machine obviously shouldn't hold copyright. But that doesn't stop people claiming that running vast amounts of code through an LLM can strip copyright from it.

Ultimately LLMs (the first L stands for large and for a good reason) are only possible to create by taking unimaginable amounts of work performed by humans who have not consented to their work being used that way, most of whom require at least being credited in derivative works and many of whom have further conditions.

Now, consent in law is a fairly new concept and for now only applied to sexual matters but I think it should apply to every human interaction. Consent can only be established when it's informed and between parties with similar bargaining power (that's one reason relationships with large age gaps are looked down upon) and can be revoked at any time. None of the authors knew this kind of mass scraping and compression would be possible, it makes sense they should reevaluate whether they want their work used that way.

There are 3 levels to this argument:

1) The letter of the law - if you understand how LLMs work, it's hard to see them as anything more than mechanical transformers of existing work so the letter should be sufficient.

2) The intent of the law - it's clear it was meant to protect human authors from exploitation by those who are in positions where they can take existing work and benefit from it without compensating the authors.

3) The ethics and morality of the matter - here it's blatantly obvious that using somebody's work against their wishes and without compensating them is wrong.

In an ideal world, these 3 levels would be identical but they're not. That means we should strive to make laws (in both intent and letter) more fair and just by changing them.

9 hours ago

MarsIronPI

If consent to use of your code in AI training can be revoked at any time, that makes training impossible, since if anyone ever withdraws consent, it's not like you can just take out their work from your finished model.

8 hours ago

martin-t

Yup. Not my problem.

You could even say it strongly would very strongly incentivize the LLM companies to be on their best behavior, otherwise people would start revoking consent en-masse and they'd have to keep training new models all the time.

If you want something more realistic, there would probably be time limits how long they have to comply and how much they have to compensate the authors for the time it took them to comply.

There absolutely are ways to make it work in mutually beneficial ways, there's just no political will because of the current hype and because companies have learned they can get away with anything (including murder BTW).

3 hours ago

adrian_b

Almost all the productivity enhancement provided by an AI coding assistant is provided by circumventing the copyright laws, with the remaining enhancement being provided by the fact that it automates the search-copy-paste loop that you would do if you had direct access to the programs used during training.

(Much of the apparent gain of the automatic search-copy-paste is wasted by skipping the review phase that would have been done at that time when that were done manually, which must then be done in a slower manner when you must review the harder-to-understand entire program generated by the AI assistant.)

Despite the fact that AI coding assistants are copyright breaking tricks, the fact that this has become somehow allowed is an overall positive development.

The concept of copyright for programs has been completely flawed from its very beginning. The reason is that it is absolutely impossible to write any kind of program that is not a derivative of earlier programs.

Any program is made by combining various standard patterns and program structures. You can construct a derivation sequence between almost any 2 programs, where you decompose the first in some typical blocks, than compose the second program from such blocks, while renaming all identifiers.

It is quite subjective to decide when a derivation sequence becomes complex enough that the second program should not be considered as a derivative of the first from the point of view of copyright.

The only way to avoid the copyright restrictions is to exploit loopholes in the law, e.g. if translating an algorithm to a different programming language does not count as being derivative or when doing other superficial automatic transformations of a source program changes its appearance sufficiently that it is not recognized as derivative, even if it actually is. Or when combining a great number of fragments from different programs is again not recognized as derivative, though it still kind of is.

The only way how it became possible for software companies like Microsoft or Adobe to copyright their s*t is because the software industry based on copyrighted programs has been jumpstarted by a few decades of programming during which programs were not copyrighted, which could then be used as a base by the first copyrighted programs.

So AI coding agents allow you to create programs that you could not have written when respecting the copyright laws. They also may prevent you from proving that a program written by someone else infringes upon the copyright that you claim for a program written with assistance.

I believe that both these developments are likely to have more positive consequences than negative consequences. The methods used first in USA and then also in most other countries (due to blackmailing by USA) for abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades.

The most ridiculous claim about the copyright of programs is that it is somehow beneficial for "creators". Artistic copyrights sometimes are beneficial for creators, but copyrights on non-open-source programs are almost never owned by creators, but by their employers, and even those have only seldom any direct benefit from the copyright, but they use it with the hope that it might prevent competition.

6 hours ago

martin-t

> The reason is that it is absolutely impossible to write any kind of program that is not a derivative of earlier programs.

And that's why copyright has exceptions for humans.

You're right copyright was the wrong tool for code but for the wrong reasons.

It shouldn't be binary. And the law should protect all work, not just creative. Either workers would come to a mutual agreement how much each contributed or the courts would decide based on estimates. Then there'd be rules about how much derivation is OK, how much requires progressively more compensation and how much the original author can plainly tell you what to do and not do with the derivative.

It's impossible to satisfy everyone but every person has a concept of fairness (it has been demonstrated even in toddlers). Many people probably even have an internally consistent theory of fairness. We should base laws on those.

> abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades

Can you give examples?

> copyrights on non-open-source programs are almost never owned by creators, but by their employers

Yes and that's another thing that's wrong with the system, employment is a form of abusive relationship because the parties are not equal. We should fix that instead of throwing out the whole system. Copyright which belongs to creators absolutely does give creators more leverage and negotiating power.

3 hours ago

martin-t

Nice, -4 points, somebody, many somebodies in fact, took that personally and yet were unable to express where they disagree in a comment.

Look, if you think I am wrong, you can surely put it into words. OTOH, if you don't think I am wrong but feel that way, then it explains why I see no coherent criticism of my statements.

9 hours ago

akerl_

When your comment is about how you can’t take your counterparty seriously and they’re a joke, you’re incentivizing people who disagree to just downvote and move on.

The signal you’re sending is that you are not open to discussing the issue.

7 hours ago

martin-t

It's a fallacy. Someone being utterly wrong and dismissing them for it so does not logically make me claim easily dismissible.

3 hours ago

akerl_

Yea, that’s exactly what I’m talking about.

25 minutes ago

lrvick

Meanwhile I expect that intellectual property protections for software are completely unenforceable and effectively useless now. If something does not exist as MIT, an LLM will create it.

The playing field is level now, and corpo moats no longer exist. I happily take that trade.

12 hours ago

Luker88

Isn't the "corpo moat" bigger now?

They can wash the copyright by AI training, but the AIs don't get trained on closed source.

"corpo" also has a ton of patents, which still can't be AI-washed.

What will become unenforceable are Open Source Licenses exclusively, how does that make it a "level field"?

12 hours ago

lrvick

Because AI is also proving to be very good at reverse engineering proprietary binaries or just straight up cloning software from test suites or user interfaces. Cuts both ways.

12 hours ago

Luker88

Reverse engineering is illegal in many jurisdictions, and especially in the USA thanks to the DMCA.

If the argument is just "They won't catch me", then yes you are correct.

But some of us are still forced to follow the law, whatever it might be.

Also: They still have patents on it.

6 hours ago

jayd16

So the argument is just "AI is magic and any kind of software can be rewritten for free"? Not really sure I buy it...

5 hours ago

martin-t

Have you ever seen what obfuscation looks like when somebody puts the effort in?

Not to mention companies will try to mandate hardware decryption keys so the binary is encrypted and your AI never even gets to analyze the code which actually runs.

It's not sci-fi, it's a natural extension of DRM.

12 hours ago

lrvick

Companies have been encrypting code to HSMs for decades. Never stopped humans from reverse engineering so it certainly will not stop AI aided by humans able to connect a Bus Pirate on the right board traces. Anything that executes on the CPU can be dumped with enough effort, and once dumped it can be decompiled.

10 hours ago

martin-t

You are agreeing with me, you just don't know it yet.

1) The financial aspect: As you say, more and more advanced DRM requires more and more advanced tools. Even assuming advanced AI can guide any human to do the physical part, that still means you have to pay for the hardware. And the hardware has to be available (companies have been known to harass people into giving up perfectly moral and legal projects).

2) The legal aspect: Possession of burglary tools is illegal in some places. How about possession of hacking tools? Right now it's not a priority for company lobbying, what about when that's the only way to decompile? Even today, reverse engineering is a legal minefield. Did you know in some countries you can technically legally reverse engineer but under some conditions such as having disabilities necessitating it and only using the result for personal use?[0]

3) The TOS aspect: What makes you think AI will help you? If the company owning the AI says so, you're on your own.

---

You need to understand 2 things:

- Just because something is possible doesn't mean somebody is gonna do it. Effort, cost and risk play huge roles. And that assumes no active hostile interference.

- History is a constant struggle between groups with various goals and incentives. Some people just want to live a happy life, have fun and build things in their free time. Other people want to become billionaires, dream about private islands, desire to control other people's lives and so on. People are good at what they focus on. There's perhaps more of the first group but the second group is really good at using their money and connections to create more money and connections which they in turn use to progress towards their primary objectives, usually at the expense of other people. People died[1] over their right to unionize. This can happen again.

Somebody might believe historical people were dumb or uncivilized and it can't happen today because we've advanced so much. That's bullshit. People have had largely the same wetware for hundreds of thousands of years. The tools have evolved but their users have not.

[0]: https://pluralistic.net/2026/03/16/whittle-a-webserver/ - "... aren't tools exemptions, they're use exemptions ... You have that right. Your mechanic does not have that right."

[1]: https://en.wikipedia.org/wiki/Pinkerton_(detective_agency)

8 hours ago

Muromec

I spend a fun week during Christmas figuring out some really obfuscated bibary code with antidebugging anti pampering things in a cryptographic context. I didn’t use ghydra or ida or anything beyond gdb with deepseek chat in a browser. That low effort got me what I needed to get.

10 hours ago

martin-t

Exactly.

AI proponents completely ignore the disparity of resources available to an individual and a corporation. If I and a company of 1000 people create the same product and compete for customers, the company's version will win. Every single time. Or maybe at least 1000:1 if you're an optimist.

They have access to more money for advertising, they have an already established network of existing customers, they have legal and marketing experts on payroll. Or just look at Microsoft, they don't even need advertising, they just install their product by default and nobody will even hear about mine.

Not to mention as you said, the training advances only goes from open source to closed source, not the other way around.

AI proponents who talk about "democratization" are nuts, it would be laughable if it wasn't so sad.

12 hours ago

Muromec

>If I and a company of 1000 people create the same product and compete for customers, the company's version will win. Every single time.

As a person who works for a company with 25k people, I would disagree. You, a single person will often get to the basic product that a lot of people will want much faster than a company with 1k, 5k and 25k people.

Bigger companies are constrained by internal processes, piles of existing stuff, and inability to hire at the scale they need and larger required context. Also regulation and all that. Bigger companies are also really slow to adapt, so they would rather let you build the product and then buy out your company with your product and people who build it. They are at at a temporary disadvantage every time the landscape shifts.

10 hours ago

martin-t

The point wasn't about the number of people, the point was a company which employs that number of people has enough money which can be converted to leverage against you.

Besides that, your whole arguments hinges on large companies being inflexible, inefficient and poorly run. Isn't that exactly the kind of problem AI promises to solve? Complete AI surveillance of every employee, tasks and instructions tailored to each individual and superhuman planning. Of course at that point, the only employees will be manual workers because actual AI will be much better and cheaper at everything than every human, except those things where it needs to interact with the physical world. Even contract negotiations with both employees and customers will be done with AI instead of humans, the human will only sign off on it for legal requirements just like today you technically enter a contract with a representative of the company who is not even there when you talk to a negotiator.

9 hours ago

SpicyLemonZest

Large companies are often inflexible and inefficient as a matter of deliberate strategy. I've found myself in scenarios where we have a complete software artifact that a smaller company would launch and find successful, but we can't launch it, because we have to satisfy some expectation we've set or do a complex integration with some important other system of ours.

6 hours ago

martin-t

A lesson from gamedev is that players will deliberately restrict themselves - sometimes to make the game more fun or challenging, sometimes to appeal to their aesthetic principles.

If/when superhuman AI is achieved, those limitations will all go away. An owner will just give it money and control and tell it to optimize for more money or political power or whatever he wants.

That's a much scarier future than a paperclip maximizer because it's much closer and it doesn't require complete takeover first, it'll be just business as usual, except more somehow more sociopathic.

3 hours ago

Luker88

> If something does not exist as MIT, an LLM will create it.

Nitpicking on the license here, but please don't use MIT, it has no patent grant protections.

And those are never covered in any AI-washing anyway.

There are equivalent licenses with patent grant protection, like 'Apache2+LLVM exception' or 'Mozilla Public License 2' and others...

6 hours ago

adrianN

The corporate moat is the army of lawyers they have. It doesn’t matter whether they win or not if you can’t afford endless litigation. Is the same for patents.

12 hours ago

Marha01

Funny, their army of lawyers seems incapable of stopping me from easily downloading pirated software or coding an open alternative to their closed-source software with AI if I wanted to..

You cannot keep a purely legally-enforced moat in the face of advancing technology.

11 hours ago

Luker88

I would caution against using this argument.

In the USA the DMCA can make it illegal to even own and use tools meant to bypass even the weakest of protection.

This law has already been used to ruin lives.

"They might catch the individual but not us all" is nice and fine until it is your turn, so check your legislation.

6 hours ago

lrvick

The music industry has an army of lawyers too, and it did not make a damn bit of difference once bittorrent was popularized.

IP law means nothing once tens of millions of people are openly violating it.

The software industry is about to learn this lesson too.

12 hours ago

dwedge

So is music free now? The record industry doesn't exist anymore, isn't ridiculously profitable? Artists are finally earning a fair share?

11 hours ago

lrvick

Music is free, because music piracy is unenforceable so the law is irrelevant. Now, I personally buy most of my music on vinyl because I want to support artists, but absolutely nothing forces me to do that as all the music is available for free.

10 hours ago

Sharlin

As far as I can see, the vast majority of people don’t pirate music these days (unlike 20 years ago). Most people wouldn’t even know where and how to pirate music. They just have Spotify or another streaming service.

6 hours ago

Marha01

> So is music free now?

Uhm... yes? The cost of downloading pirated music is essentially zero. The only reason why people use services like Spotify is because it's extremely cheap while being a bit more convenient. But jack up the price and the masses will move to sail the sea again.

11 hours ago

dwedge

The cost of stealing has always been essentially zero. Same argument can be made for streaming, and yet Netflix is neither cheap nor struggling for subscribers.

10 hours ago

Marha01

> The cost of stealing has always been essentially zero.

That is not necessarily true, depending on the level of enforcement and the availability of opportunities to steal.

> Same argument can be made for streaming, and yet Netflix is neither cheap nor struggling for subscribers.

Netflix is still pretty cheap for the convenience it provides. Again, jack up the price and see the masses move to torrent movies/shows again.

10 hours ago

Applejinx

In the sense of artists cannot expect to get any money for their work, yeah music's free. Becoming a meme or a celebrity on the grounds of personality is still fair game, to the extent that AI is not impersonating people effectively at scale yet.

Yet.

A whole bunch of people I watch on youtube (politics, analysts, a weatherman) are already seeing AI impersonation videos, sometimes misrepresenting their positions and identities. This will grow.

So, you can't create art because that's extruded at scale in such a way that it's just turning on the tap to fill a specified need, and you can't be a person because that can also be extruded at scale pretty soon, either to co-opt whatever you do that's distinct, or to contradict whatever you're trying to say, as you.

As far as being a person able to exist and function through exchanging anything you are or anything you do for recompense, to survive, I'm not sure that's in the cards. Which seems weird for a technology in the guise of aiding people.

10 hours ago

jayd16

This means that all copyleft is MIT but it doesn't change the closed source stuff... So once again it benefits corpo more than most.

5 hours ago

ako

Generating software still token costs, generating something like ms-word will still cost a significant amount, takes a lot of human effort to prompt and validate. Having a proven solution still has value.

10 hours ago

lrvick

You can already generate surprisingly complex software on an LLM on a raspberry pi now, including live voice assistance, all offline. Peoples hardware can self write software pretty readily now. The cost of tokens is a race to zero.

9 hours ago

nonameiguess

Ironically, I actually suspect the exact opposite. Linux has no real choice in this matter because most of the code is written by Google, Red Hat, Cisco, and Amazon at this point, and these big cos are all going to mandate their developers have to use AI coding agents. Refuse to accept these contributions and we're just going to end up with 20 Linuxes instead of one, and the original still under the control of Linus will be relegated to desktop usage and wither and die.

10 hours ago

VorpalWay

> Any license on "100% vibecoded" projects can be safely ignored.

As far as I know that has only been decided in US so far, which is far from the whole world.

12 hours ago

Luker88

There was a study from the US copyright office that found a single jurisdiction where the output of an AI prompt is copyrightable: China.

Everything else is various shades of "No, unless a human modified it"

edit: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

6 hours ago

IsTom

In Poland law is similar in this regard, so I'd assume at least some other countries do this as well.

11 hours ago

OtomotO

So, how are you gonna prove I didn't write some code?

How am I gonna prove I did?

11 hours ago

adrian_b

They do not have to prove anything.

They can just generate the same code with an AI assistant, and then it is you who cannot claim that their code infringes the copyright that you claim for the code that you have written with assistance.

So neither of the 2 parties that have used an AI assistant is able to prevent the other party to use the generated code.

I consider this as a rather good outcome and not as a disadvantage of using AI assistants. However, this may be construed as a problem by the stupid corporate lawyers who insist that any product of the company must use only software IP than is the property of the company.

These kind of lawyers are encountered in many companies and they are the main reason for the low software productivity that was typical in many places before the use of AI assistants.

I wonder how many of those lawyers have already understood that this new fashion of using AI is incompatible with their mandated policies, which have always been the main blocker against efficient software reuse.

5 hours ago

OtomotO

I was talking more generally about the "You can't patent or copyright code that was generated with an LLM".

Who can prove that I didn't write the code myself? And if I did, how am I to prove it?

That goes in both directions.

It's not like there is a watermark in the code telling the whole wide world that this was AI generated or human made.

So I write code (with or without an AI assistant) and claim copyright... they generate the same code. I sue them.

How does any of us prove that we wrote the code by hand?

4 hours ago

alfiedotwtf

In what jurisdiction?!

It’s weird how people on HN state legal opinion as fact… e.g if someone in the Philippines vibecodes an app and a person in Equador vibecodes a 100% copy of the source, what now?

7 hours ago

Luker88

There was a study from the US copyright office that found a single jurisdiction where the output of an AI prompt is copyrightable: China.

Everywhere else in the world is in various shades of "No, unless a human modified it"

https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

6 hours ago

Sharlin

There’s this thing called the Berne Convention. Countries that cooperate on copyright are going to standardize their interpretations on questions like this sooner or later.

6 hours ago

martin-t

I don't think modified by a human is enough. If you take licensed text (code or otherwise) and manually replace every word with a synonym, it does not remove the license. If you manually change every loop into a map/filter, it does not remove the license. I don't think any amount of mechanical transformation, regardless if done by a human or machine erases it.

There's a threshold where you modify it enough, it is no longer recognizable as being a modification of the original and you might get away with it, unless you confess what process you used to create it.

This is different to learning from the original and then building something equivalent from scratch using only your memory without constantly looking back and forth between your copy and the original.

This is how some companies do "clear room reimplementations" - one team looks at the original and writes a spec, another team which has never seen the original code implements an entirely standalone version.

And of course there are people who claim this can be automated now[0]. This one is satire (read the blog) but it is possible if the law is interpreted the way LLM companies work and there are reports the website works as advertised by people who were willing to spend money to test it.

[0]: https://malus.sh/

12 hours ago

lrvick

You only need to feed the docs and tests to an LLM to get a "clean room" re-implementation that can then be relicensed.

10 hours ago

Muromec

That wasn't tested legally.

10 hours ago

lrvick

If they actually were decided to be infringements somehow, there are millions of different cases needed already, so it is already past the point of enforcement.

These sorts of things are almost never tested legally and it seems even less likely now.

8 hours ago

beAbU

It cannot be understated how religiously opposed many in the woodworking community are to even a single table saw assisted cut making it's way to a piece of furniture, no matter how well designed.

Plenty see {{some_woodworker}} as a traitor for this policy and will never contribute again if any clearly labeled table saw cuts is actually allowed to be used in furniture making.

4 hours ago

agentultra

There's a stark difference between a table saw and an LLM that weakens this argument.

A table saw isn't a probabilistic device.

3 hours ago

theshrike79

But I, a woodworker, can immediately see if the piece of wood that came out of the table saw looks like it should.

Also I, a programmer, can immediately see whether the "probabilistic device" generated code that looks like it should.

Both just let me get to the same result faster with good enough quality for the situation.

I can grab a tape measure or calipers and examine the piece of wood I cut on the table saw and check if it has the correct measurements. I can also use automated tests and checks to see that the code produced looks as it should and acts as it should.

If it looks like a duck and quacks like a duck... Do we really need to care if the duck was generated by an AI?

3 hours ago

agentultra

Also I, a programmer, can immediately see whether the "probabilistic device" generated code that looks like it should.

I highly doubt that.

Empirical studies show that humans have very little effect on error rates when reviewing code. That effect disappears quickly the more code you read.

Most programmers are bad at detecting UB and memory ownership and lifetime errors.

A piece of wood comes off the table it’s cut or it’s not.

Code is far more complex.

41 minutes ago

theshrike79

> Most programmers are bad at detecting UB and memory ownership and lifetime errors.

And this is why we have languages and tooling that takes care of it.

There's only a handful of people who can one-shot perfect code in a language that doesn't guard against memory ownership or lifetime errors every time.

But even the crappiest programmer has to actually work against the tooling in a language like Rust to ownership issues. Add linters, formatters and unit tests on top of that and it becomes nigh-impossible.

Now put an LLM in the same position, it's also unable to create shitty code when the tooling prevents it from doing so.

33 minutes ago

agentultra

A piece of wood is either cut to spec or not. You don’t have to try and convince the table saw with a prompt that it is a table saw.

These tools are nothing alike and the reductionism of this metaphor isn’t helpful.

4 minutes ago

beAbU

Anyone who has used a table saw before knows it's anything but probabilistic. Jus a little carelessness and you cut your thumb off.

As with LLMs, where careless use results in you dropping prod db or exposing user data.

2 hours ago

oompydoompy74

I find the strong anti AI sentiment just as annoying as the strong pro AI sentiment. I hope that the extremes can go scream in their own echo chamber soon, so that the rest of us can get back to building and talking about how to make technology useful.

4 hours ago

Klonoar

Reads like a “fuck you and I’ll see you tomorrow” threat.

6 hours ago

dxdm

Sounds dramatic, but it entirely depends on what "many" and "plenty" means in your comment, and who exactly is included. So far, what you wrote can be seen as an expectable level of drama surrounding such projects.

13 hours ago

ebbi

True - on Mastodon there is a very vocal crowd that are against AI in general, and are identifying Linux distros that have AI generated code with the view of boycotting it.

14 hours ago

lrvick

Soon they will have to boycott all of them. Then what I wonder?

12 hours ago

positron26

What these hardliners are standing for, I have no idea. If the code passes review, we're just arguing about hues of zeros and ones. "AI" is an attribute that type-erases entirely once an engineer pulls out the useful expressions and whips them into shape.

The worst part about all reactionary scares is that, because the behaviors are driven by emotion and feeling as opposed to any intentional course of action, the outcomes are usually counter productive. The current AI scare is exactly what you would want if you are OpenAI. Convince OSS, not to mention "free" software people, to run around dooming and ant milling each other about "AI bad" and pretty soon OSS is a poisonous minefield for any actual open AI, so OSS as a whole just sabotages itself and is mostly out of the fight.

I'm currently in the middle of trying to blow straight past this gatekeepy outer layer of the online discourse. What is a bit frustrating is knowing that while the seed will find the niches and begin spreading through invisible channels, in the visible channels, there's going to be all kinds of knee-jerk pushback from these anti-AI hardliners who can't distinguish between local AI and paying Anthropic for a license to use a computer. Worse, they don't care. The social psychosis of being empowered against some "others" is more important. Either that or they are bots.

And all of this is on top of what I've been saying for over a year. VRAM efficiency will kill the datacenter overspend. Local, online training will make it so that skilled users get better models over time, on their own data. Consultative AI is the future.

I have to remind myself that this entire misstep is a result of a broken information space, late-stage traditional social, filled with people (and "people") who have been programmed for years on performative clap-backs and middling ideas.

So fortunate to have some life before internet perspective to lean back on. My instinct and old-world common sense can see a way out, but it is nonetheless frustrating to watch the online discourse essentially blinding itself while doubling down on all this hand wringing to no end, accomplishing nothing more than burning a few witches and salting their own lands. You couldn't want it any better if you were busy entrenching.

4 hours ago

abc123abc123

Doesn't matter. Linux today is a toy of corporations and stopped being community oriented a long time ago. Community orientation I think these days only exists among the BSD and some fringe linux distributions.

The linux foundation itself, is just one big, woke, leftist mess, with CV-stuffers from corporations in every significant position.

11 hours ago

simonask

The idea that something can simultaneously be "woke [and] leftist" and somehow still defined by its attachments to corporations is a baffling expression of how detached from reality the US political discourse is.

The rest of the world looks on in wonder at both sides of this.

11 hours ago

Sharlin

"I hate corporations and I hate leftists, ergo they must be the same thing"

6 hours ago

oompydoompy74

I wish everyone could be so rational, well reasoned, and balanced on this subject.

4 hours ago

galaxyLogic

But then if AI output is not under GNU General Public License, how can it become so just because a Linux-developer adds it to the code-base?

a day ago

jillesvangurp

AIs are not human and therefore their output is a human authored contribution and only human authored things are covered by copyright. The work might hypothetically infringe on other people's copyright. But such an infringement does not happen until a human decides to create and distribute a work that somehow integrates that generated code or text.

The solution documented here seems very pragmatic. You as a contributor simply state that you are making the contribution and that you are not infringing on other people's work with that contribution under the GPLv2. And you document the fact that you used AI for transparency reasons.

There is a lot of legal murkiness around how training data is handled, and the output of the models. Or even the models themselves. Is something that in no way or shape resembles a copyrighted work (i.e. a model) actually distributing that work? The legal arguments here will probably take a long time to settle but it seems the fair use concept offers a way out here. You might create potentially infringing work with a model that may or may not be covered by fair use. But that would be your decision.

For small contributions to the Linux kernel it would be hard to argue that a passing resemblance of say a for loop in the contribution to some for loop in somebody else's code base would be anything else than coincidence or fair use.

a day ago

heavyset_go

Copyright Office's interpretation of US copyright laws says that AI is not human, thus not an attributable author for copyright registration, and output based on mere prompting is no one's IP, it can't be copyrighted[1].

When AI output can be copyrighted is when copyrighted elements are expressed in it, like if you put copyrighted content in a prompt and it is expressed in the output, or the output is transformed substantially with human creativity in arrangement, form, composition, etc.

[1] https://newsroom.loc.gov/news/copyright-office-releases-part...

14 hours ago

nitwit005

That you can't copyright the AI's output (in the US, at least), doesn't imply it doesn't contain copyrighted material. If you generate an image of a Disney character, Disney still owns the copyright to that character.

a day ago

NitpickLawyer

> That you can't copyright the AI's output (in the US, at least),

It's also not really clear if you can or cannot copyright AI output. The case that everyone cites didn't even reach the point where courts had to rule on that. The human in that case decided to file the copyright for an AI, and the courts ruled that according to the existing laws copyright must be filed by a person/human/whatever.

So we don't yet have caselaw where someone used AIgen and claimed the output as written by them.

16 hours ago

metalcrow

You can copyright AI output assuming there is a "reasonable" degree of human involvement. https://www.cnet.com/tech/services-and-software/this-company...

14 hours ago

fxtentacle

Yes. And that’s why the rules say that the human submitting the code is responsible for preventing this case.

18 hours ago

friendzis

> Is something that in no way or shape resembles a copyrighted work (i.e. a model) actually distributing that work?

Does a digitally encoded version resemble a copyrighted work in some shape or form? </snark>

Where is this hangup on models being something entirely different than an encoding coming from? Given enough prodding they can reproduce training data verbatim or close to that. Okay, given enough prodding notepad can do that too, so uncertainty is understandable.

This is one of the big reasons companies are putting effort into the so called "safety": when the legal battles are eventually fought, they would have an argument that they made their best so that the amount of prodding required to extract any information potentially putting them under liability is too great to matter.

15 hours ago

jillesvangurp

> Does a digitally encoded version resemble a copyrighted work in some shape or form? </snark>

Well that's different because an encoded image or video clearly intends to reproduce the original perfectly and the end result after decoding is (intentionally) very close to form of the original. Which makes it a clear cut case of being a copy of the original.

The reason so many cases don't get very far is that mostly judges and lawyers don't think like engineers. Copyright law predates most modern technology. So, everything needs to be rephrased in terms of people copying stuff for commercial gain. The original target of the law was people using printing presses to create copies of books written by others. Which was hugely annoying to some publishers who thought they had exclusive deals with authors. But what about academics quoting each other? Or literary reviews. Or summaries. Or people reading from a book on the radio? This stuff gets complicated quickly. Most of those things were settled a long time ago. Fair use is a concept that gets wielded a lot for this. Yes its a copy but its entirely reasonable for the copy holder to be doing what they are doing and therefore not considered an infringement.

The rest is just centuries of legal interpretation of that and how it applies to modern technology. Whether that's DJs sampling music or artists working in visual imagery into their art works. AI is mostly just more of the same here. Yes there are some legally interesting aspects with AI but not that many new ones. Judges are unlikely to rethink centuries of legal interpretations here and are more likely to try to reconcile AI in with existing decisions. Any changes to the law would have to be driven by politicians; judges tend to be conservative with their interpretations.

9 hours ago

ninjagoo

IANAL; this is what my limited understanding of the matter is. With that caveat: it is easy to forget that copyright is on output- verbatim or exact reproductions and derivatives of a covered work are already covered under copyright.

So if the AI outputs Starry Night or Starry Night in different color theme, that's likely infringement without permission from van Gogh, who would have recourse against someone, either the user or the AI provider.

But a starry-night style picture of an aquarium might not be infringing at all.

>For small contributions to the Linux kernel it would be hard to argue that a passing resemblance of say a for loop in the contribution to some for loop in somebody else's code base would be anything else than coincidence or fair use.

I would argue that if it was a verbatim reproduction of a copyrighted piece of software, that would likely be infringing. But if it was similar only in style, with different function names and structure, probably not infringing.

Folks will argue that some things might be too small to do any different, for example a tiny snippet like python print("hello") or 1+1=2 or a for loop in your example. In that case it's too lacking in original expression to qualify for copyright protection anyway.

a day ago

inglor_cz

Starry Night is public domain everywhere (van Gogh died 136 years ago and AFAIK there is no place on Earth that would have copyright that long).

But your point still stands.

5 hours ago

Lerc

>AIs are not human and therefore their output is a human authored contribution and only human authored things are covered by copyright.

That is a non sequitur. Also, I'm not sure if copyright applies to humans, or persons (not that I have encountered particularly creative corporations, but Taranaki Maunga has been known for large scale decorative works)

a day ago

Sharlin

18 hours ago

direwolf20

A "large scale decorative work" is the strangest euphemism for a dormant volcano I've ever heard.

15 hours ago

Lerc

Well obviously it's not doing any decorating right at the moment.

12 hours ago

mcv

Didn't a court in the US declare that AI generated content cannot be copyrighted? I think that could be a problem for AI generated code. Fine for projects with an MIT/BSD license I suppose, but GPL relies on copyright.

However, if the code has been slightly changed by a human, it can be copyrighted again. I think.

a day ago

simonw

Thaler v. Perlmutter said that an AI system cannot be listed as the sole author of a work - copyright requires a human author.

US Copyright Office guidance in 2023 said work created with the help of AI can be registered as long as there is "sufficient human creative input". I don't believe that has ever been qualified with respect to code, but my instinct is that the way most people use coding agents (especially for something like kernel development) would qualify.

21 hours ago

davemp

Interesting. That seems to suggest that one would need to retain the prompts in order to pursue copyright claims if a defendant can cast enough doubt on human authorship.

Though I guess such a suit is unlikely if the defendant could just AI wash the work in the first place.

18 hours ago

tadfisher

No, a court did not declare that. The case involved a person trying to register a work with only the AI system listed as author. The Supreme Court decided that you can't do that, you need to list a human being as author to register a work with the Copyright Office. This stems from existing precedent where someone tried to register a photograph with the monkey photographer listed as author.

I don't believe the idea that humans can or can't claim copyright over AI-authored works has been tested. The Copyright Office says your prompt doesn't count and you need some human-authored element in the final work. We'll have to see.

21 hours ago

papercrane

It's almost a certainty that you can't copyright code that was generated entirely by an AI.

Copyright requires some amount of human originality. You could copyright the prompt, and if you modify the generated code you can claim copyright on your modifications.

The closest applicable case would be the monkey selfie.

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

19 hours ago

abletonlive

It's almost certain that you're wrong. It's like saying I can't copyright a song if my modular synthesizer generated it. Why would you think this?

18 hours ago

manwe150

I’m curious to see if subscription vs free ends up mattering here. If it is a work for hire, generally it doesn’t matter how the work was produced, the end result is mine, because I contracted and instructed (prompted?) someone to do it for me. So will the copyright office decide it cares if I paid for the AI tool explicitly?

19 hours ago

galaxyLogic

That would depend on whether those who sold you the software-output, had copyright to it.

14 hours ago

RussianCow

> Didn't a court in the US declare that AI generated content cannot be copyrighted?

No, my understanding is that AI generated content can't be copyrighted by the AI. A human can still copyright it, however.

a day ago

Sharlin

It's obvious that a computer program cannot have copyright because computer programs are not persons in any currently existing jurisdiction.

Whether a person can claim copyright of the output of a computer program is generally understood as depending on whether there was sufficient creative effort from said person, and it doesn't really matter whether the program is Photoshop or ChatGPT.

18 hours ago

paradoxyl

Just thinking out loud... why can't an algorithm be an artificial person in the legal sense that a corporation is? Why not legally incorporate the AI as a corporation so it can operate in the real world: have accounts, create and hold copyrights...

18 hours ago

direwolf20

Because the law doesn't say it can. It's that simple.

15 hours ago

SpicyLemonZest

Corporations are required to have human directors with full operational authority over the corporation's actions. This allows a court to summon them and compel them to do or not do things in the physical world. There's no reason a corporation can't choose to have an AI operate their accounts, but this won't affect the copyright status, and if the directors try to claim they can't override the AI's control of the accounts they'll find themselves in jail for contempt the first time the corporation faces a lawsuit.

17 hours ago

galaxyLogic

So if creative effort was put into writing the prompt, then whoever wrote the prompt should have the copyright to the output produced by ChatGPT?

14 hours ago

LtWorf

Sure, but the prompt wasn't the only input… there was considerable effort put into the training data as well :)

10 hours ago

singpolyma3

Public domain code is GPL compatible

19 hours ago

afro88

Same as if a regular person did the same. They are responsible for it. If you're using AI, check the code doesn't violate licenses

a day ago

rzmmm

In certain law cases plagiarization can be influenced by the fact if person is exposed to the copyrighted work. AI models are exposed to very large corpus of works..

a day ago

cxr

Copyright infringement and plagiarism are not the same or even very closely related. They're different concepts and not interchangeable. Relative to copyright infringement, cases of plagiarism are rarely a matter for courts to decide or care about at all. Plagiarism is primarily an ethical (and not civil or criminal) matter. Rather than be dealt with by the legal system, it is the subject of codes of ethics within e.g. academia, journalism, etc. which have their own extra-judicial standards and methods of enforcement.

a day ago

dekhn

I suspect they were instead referring to patents; for example, when I worked at Google, they told the engineers not to read patents because then the engineer might invent something infringing, I think it's called willful infringement. No other employer I've worked for has every raised this as an issue, while many lawyers at google would warn against this.

a day ago

martin-t

You're right, legally speaking.

But you shouldn't be right. I mean, morally.

The law is a compromise between what the people in power want and what they can get away with without people revolting. It has nothing to do with morality, fairness or justice. And we should change that. The promise of democracy was (among other things) that everyone would be equal, everybody would get to vote and laws would be decided by the moral system of the majority. And yet, today, most people will tell you they are unhappy about the rising cost of living and rising inequality...

The law should be based on complete and consistent moral system. And then plagiarism (taking advantage of another person's intellectual work without credit or compensation) would absolutely be a legal matter.

8 hours ago

martin-t

As opposed to an irregular person?

LLMs are not persons, not even legal ones (which itself is a massive hack causing massive issues such as using corporate finances for political gain).

A human has moral value a text model does not. A human has limitations in both time and memory available, a model of text does not. I don't see why comparisons to humans have any relevance. Just because a human can do something does not mean machines run by corporations should be able to do it en-masse.

The rules of copyright allow humans to do certain things because:

- Learning enriches the human.

- Once a human consumes information, he can't willingly forget it.

- It is impossible to prove how much a human-created intellectual work is based on others.

With LLMs:

- Training (let's not anthropomorphize: lossily-compressing input data by detecting and extracting patterns) enriches only the corporation which owns it.

- It's perfectly possible to create a model based only on content with specific licenses or only public domain.

- It's possible to trace every single output byte to quantifiable influences from every single input byte. It's just not an interesting line of inquiry for the corporations benefiting from the legal gray area.

a day ago

afro88

Dude come on, I clearly wasn't saying LLMs are people. My point was it's a tool and it's the responsibility of the person wielding it to check outputs.

If it's too hard to check outputs, don't use the tool.

Your arguments about copyright being different for LLMs: at the moment that's still being defined legally. So for now it's an ethical concern rather than a legal one.

For what it's worth I agree that LLMs being trained on copyright material is an abuse of current human oriented copyright laws. There's no way this will just continue to happen. Megacorps aren't going to lie down if there's a piece of the pie on the table, and then there's precedent for everyone else (class action perhaps)

16 hours ago

martin-t

Alright, I did make that assumption because I've seen and heard people talk about LLM as people. It worries me that otherwise functional and reasonable people, some of them my friends, have been so easily been convinced by a machine which demonstrated its flaws to me daily.

As for checking outputs - I don't believe that's sufficient. Maybe the letter of the law is flawed but according to the spirit the model itself is derivative work.

A model takes several orders of magnitude more work as training data than it takes to code the training algorithm itself, to any reasonable and sane person, that makes it a derivative work of the training data by nearly 100% - we can only argue how many nines it should be.

> precedent

Yeah but the US system makes me very uneasy about it. The right way to do this is to sit down, talk about the options and their downstream implications, talking about fairness and justice and then deciding what the law should be. If we did that, copyright law would look very different in the first place and this whole thing would have an obvious solution.

3 hours ago

sarchertech

How could you do that though? You can’t guarantee that there aren’t chunks of copied code that infringes.

a day ago

Andrex

Let me introduce you to the concept of submarine patents...

a day ago

shevy-java

But the responsible party is still the human who added the code. Not the tool that helped do so.

a day ago

aargh_aargh

The practical concern of Linux developers regarding responsibility is not being able to ban the author, it's that the author should take ongoing care for his contribution.

a day ago

Cytobit

That's not going to shield the Linux organization.

a day ago

cxr

A DCO bearing a claim of original authorship (or assertion of other permitted use) isn't going to shield them entirely, but it can mitigate liability and damages.

a day ago

sarchertech

Can it though? As far as I know this hasn’t been tested.

20 hours ago

sarchertech

In a court case the responsibility party very well could be the Linux foundation because this is a foreseeable consequence of allowing AI contributions. There’s no reasonable way for a human to make such a guarantee while using AI generated code.

a day ago

Chance-Device

It’s not about the mechanism: responsibility is a social construct, it works the way people say that it works. If we all agree that a human can agree to bear the responsibility for AI outputs, and face any consequences resulting from those outputs, then that’s the whole shebang.

a day ago

sarchertech

Sure we could change the law. It would be a stupid change to allow individuals, organizations, and companies to completely shield themselves from the consequences of risky behaviors (more than we already do) simply by assigning all liability to a fall guy.

a day ago

Chance-Device

What law exactly are you suggesting needs to be changed? How is this any different from what already happens right now, today?

a day ago

sarchertech

Right now it's very easy not to infringe on copyrighted code if you write the code yourself. In the vast majority of cases if you infringed it's because you did something wrong that you could have prevented (in the case where you didn't do anything wrong, inducement creation is an affirmative defense against copyright infringement).

That is not the case when using AI generated code. There is no way to use it without the chance of introducing infringing code.

Because of that if you tell a user they can use AI generated code, and they introduce infringing code, that was a foreseeable outcome of your action. In the case where you are the owner of a company, or the head of an organization that benefits from contributors using AI code, your company or organization could be liable.

a day ago

galaxyLogic

So it's a bit as if Linux Organization told its contributors you can bring in infringing code but you must agree you are liable for any infringement?

But if a lawsuit was later brought who would be sued? The individual author or the organization? In other words can an organization reduce its liability if it tells its employees "You can break the law as long as you agree you are solely responsible for such illegal actions?

It would seem to me that the employer would be liable if they "encourage" this way of working?

14 hours ago

Chance-Device

It’s a foreseeable outcome that humans might introduce copyrighted code into the kernel.

I think you’re looking for problems that don’t really exist here, you seem committed to an anti AI stance where none is justified.

a day ago

sarchertech

A human has to willingly violate the law for that to happen though. There is no way for a human to use AI generated that doesn't have a chance of producing copyrighted code though. That's just expected.

If you don't think this is a problem take a look at the terms of the enterprise agreements from OpenAI and Anthropic. Companies recognize this is an issue and so they were forced to add an indemnification clause, explicitly saying they'll pay for any damages resulting in infringement lawsuits.

a day ago

johnisgood

> Right now it's very easy not to infringe on copyrighted code if you write the code yourself.

Humans routinely produce code similar to or identical to existing copyrighted code without direct copying.

20 hours ago

sarchertech

They don’t produce enough similar code to infringe frequently. And if they did independent creation is an affirmative defense to copyright infringement that likely doesn’t apply to LLMs since they have the demonstrated capability to produce code directly from their training set.

20 hours ago

johnisgood

You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.

On independent creation: you are conflating the tool with the user. The defense applies to whether the developer had access to the copyrighted work, not whether their tools did. A developer using an LLM did not access the training set directly, they used a synthesis tool. By your logic, any developer who has read GPL code on GitHub should lose independent creation defense because they have "demonstrated capability to produce code directly from" their memory.

LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case). Training set contamination happens, but it is rare and considered a bug. Humans also occasionally reproduce code from memory: we do not deny them independent creation defense wholesale because of that capability!

In any case, the legal question is not settled, but the argument that LLM-assisted code categorically cannot qualify for independent creation defense creates a double standard that human-written code does not face.

20 hours ago

sarchertech

> You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.

Practically speaking humans do not produce code that would be found in court to be infringing without intent.

It is theoretically possible, but it is not something that a reasonable person would foresee as a potential consequence.

That’s the difference.

> LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case).

Exactly. It is a documented failure mode that you as a user have no capacity to mitigate or to even be aware is happening.

Double standards are perfectly fine. LLMs are not conscious beings that deserve protection under the law.

>not settled.

What appears to likely be settled is that human authorship is required, so there’s no way that an LLM could qualify for independent creation.

3 hours ago

direwolf20

And that's not an infringement. Actual copying is the infringement, not having the same code. The most likely way to have the same code is by copying, but it's not the only way.

15 hours ago

bpt3

In this case, the "fall guy" is the person who actually introduced the code in question into the codebase.

They wouldn't be some patsy that is around just to take blame, but the actual responsible party for the issue.

a day ago

sarchertech

Imagine your a factory owner and you need a chemical delivered from across the country, but the chemical is dangerous and if the tanker truck drives faster than 50 miles per hour it has a 0.001% chance per mile of exploding.

You hire an independent contractor and tell him that he can drive 60 miles per hour if he wants to but if it explodes he accepts responsibility.

He does and it explodes killing 10 people. If the family of those 10 people has evidence you created the conditions to cause the explosion in order to benefit your company, you're probably going to lose in civil court.

Linus benefits from the increase velocity of people using AI. He doesn't get to put all the liability on the people contributing.

a day ago

raincole

Cool analogy! Which has nothing to do with the topic in hand.

15 hours ago

bpt3

That is a nonsensical analogy on multiple levels, and doesn't even support your own argument.

20 hours ago

sarchertech

Nice rebuttal.

20 hours ago

bpt3

Why would I put much effort into responding to a post like yours, which makes no sense and just shows that you don't understand what you're talking about?

20 hours ago

sarchertech

Why would you put any effort into it at all?

2 hours ago

lo_zamoyski

Responsibility is an objective fact, not just some arbitrary social convention. What we can agree or disagree about is where it rests, but that's a matter of inference, an inference can be more or less correct. We might assign certain people certain responsibilities before the fact, but that's to charge them with the care of some good, not to blame them for things before they were charged with their care.

a day ago

bitwize

Because contributions to Linux are meticulously attributed to, and remain property of, their authors, those authors bear ultimate responsibility. If Fred Foobar sends patches to the kernel that, as it turns out, contain copyrighted code, then provided upstream maintainers did reasonable due diligence the court will go after Fred Foobar for damages, and quite likely demand that the kernel organization no longer distribute copies of the kernel with Fred's code in it.

a day ago

sarchertech

Anyone distributing infringing material can be liable, and it’s unlikely that this technicality will actually would shield anyone.

Anyone who thinks they have a strong infringement case isn’t going to stop at the guy who authored the code, they’re going to go after anyone with deep pockets with a good chance of winning.

20 hours ago

Marha01

> Anyone distributing infringing material can be liable

There is still the "mens rea" principle. If you distribute infringing material unknowingly, it would very likely not result in any penalties.

10 hours ago

sarchertech

2 hours ago

noosphr

Tab complete does not produce copyrightable material either. Yet we don't require software to be written in nano.

a day ago

rpdillon

This is a nice point that I haven't seen before. It's interesting to regress AI to the simplest form and see how we treat it as a test for the more complex cases.

18 hours ago

Tomte

There is already lots and lots of non-GPL code in the kernel, under dozens of licenses, see https://raw.githubusercontent.com/Open-Source-Compliance/pac...

As long as everything is GPLv2-compatible it‘s okay.

14 hours ago

panzi

If the output is public domain it's fine as I understand it.

a day ago

galaxyLogic

Makes sense to me. But so anybody can take Public Domain code and place it under GNU Public License (by dropping it into a Linux source-code file) ?

Surely the person doing so would be responsible for doing so, but are they doing anything wrong?

a day ago

robinsonb5

> Surely the person doing so would be responsible for doing so, but are they doing anything wrong?

You're perfectly at liberty to relicense public domain code if you wish.

The only thing you can't do is enforce the new license against people who obtain the code independently - either from the same source you did, or from a different source that doesn't carry your license.

a day ago

cwnyth

This is correct, and it's not limited to code. I can take the story of Cinderella, create something new out of it, copyright my new work, but Cinderella remains public domain for someone else to do something with.

If I use public domain code in a project under a license, the whole work remains under the license, but not the public domain code.

I'm not sure what the hullabaloo is about.

a day ago

manwe150

If someone else uses your exact same prompt to generate the exact same code, can you claim copyright infringement against them? If the output is possible to copyright, then you could claim their prompt is infringement (just like if it reproduced Harry Potter). If it isn’t copyrightable, then the kernel would not have legal standing to enforce the GPL on those lines of code against any future AI reproduction of them. The developers might need to show that the code is licensed under GPL and only GPL, otherwise there is the possibility the same original contributor (eg the AI) did permit the copy. The GPL is an imposed restriction on what the kernel can legally do with any code contributions. That seems legally complicated for some projects—probably not the kernel with the large amount of pre-AI code, but maybe it spells trouble for smaller newer projects if they want to sue over infringement. IANAL.

19 hours ago

robinsonb5

> If someone else uses your exact same prompt to generate the exact same code, can you claim copyright infringement against them?

No, because they've independently obtained it from the same source that you did, so their copy is "upstream" of your imposing of a new license.

Realistically, adding a license to public domain work is only really meaningful when you've used it as a starting point for something else, and want to apply your license to the derivative work.

13 hours ago

direwolf20

15 hours ago

tomjen3

Be careful here - you cannot copyright a story, only the specific tangible form of the story.

9 hours ago

cwnyth

Which is why I used precise language: "copyright my new *work*."

29 minutes ago

jaggederest

The core thing about licenses, in general, is that they only grant new usage. If you can already use the code because it's public domain, they don't further restrict it. The license, in that case, is irrelevant.

Remember that licenses are powered by copyright - granting a license to non-copyrighted code doesn't do anything, because there's no enforcement mechanism.

This is also why copyright reform for software engineering is so important, because code entering the public domain cuts the gordian knot of licensing issues.

a day ago

miki123211

Linux code doesn't have to strictly be GPL-only, it just has to be GPL-compatible.

If your license allows others to take the code and redistribute it with extra conditions, your code can be imported into the kernel. AFAIK there are parts of the kernel that are BSD-licensed.

a day ago

sambaumann

Sqlite’s source code is public domain. Surely if you dropped the sqlite source code into Linux, it wouldn’t suddenly become GPL code? I’m not sure how it works

a day ago

jasomill

The Linux kernel would become a GPLv2-licensed derivative work of SQLite, but that doesn’t matter, because public domain works, by definition, are not subject to copyright restrictions.

Claiming copyright on an unmodified public domain work is a lie, so in some circumstances could be an element of fraud, but still wouldn’t be a copyright violation.

15 hours ago

martin-t

This ruling is IMO/IANAL based on lawyers and judges not understanding how LLMs work internally, falling for the marketing campaign calling them "AI" and not understanding the full implications.

LLM-creation ("training") involves detecting/compressing patterns of the input. Inference generates statistically probable based on similarities of patterns to those found in the "training" input. Computers don't learn or have ideas, they always operate on representations, it's nothing more than any other mechanical transformation. It should not erase copyright any more than synonym substitution.

a day ago

supern0va

>LLM-creation ("training") involves detecting/compressing patterns of the input.

There's a pretty compelling argument that this is essentially what we do, and that what we think of as creativity is just copying, transforming, and combining ideas.

LLMs are interesting because that compression forces distilling the world down into its constituent parts and learning about the relationships between ideas. While it's absolutely possible (or even likely for certain prompts) that models can regurgitate text very similar to their inputs, that is not usually what seems to be happening.

They actually appear to be little remix engines that can fit the pieces together to solve the thing you're asking for, and we do have some evidence that the models are able to accomplish things that are not represented in their training sets.

Kirby Ferguson's video on this is pretty great: https://www.youtube.com/watch?v=X9RYuvPCQUA

a day ago

martin-t

So? Why should it be legal?

If people find this cool and wanna play with it, they can, just make sure to only mix compatible licenses in the training data and license the output appropriately. Well, the attribution issue is still there, so maybe they can restrict themselves to public domain stuff. If LLMs are so capable, it shouldn't limit the quality of their output too much.

Now for the real issue: what do you think the world will look like in 5 or 10 years if LLMs surpass human abilities in all areas revolving around text input and output?

Do you think the people who made it possible, who spent years of their life building and maintaining open source code, will be rewarded? Or will the rich reap most of the benefit while also simultaneously turning us into beggars?

Even if you assume 100% of the people doing intellectual work now will convert to manual work (i.e. there's enough work for everyone) and robots don't advance at all, that'll drive the value of manual labor down a lot. Do you have it games out in your head and believe somehow life will be better for you, let alone for most people? Or have yo not thought about it at all yet?

a day ago

galaxyLogic

> Do you think the people who made it possible, who spent years of their life building and maintaining open source code, will be rewarded?

I think they should be rewarded more than they are currently. But isn't the GNU Public License bassically saying you can use such source-code without giving any rewards what so ever?

But I see your The reward for Open Source developers is the public recognition for their works. LLMs can take that recognition away.

13 hours ago

Marha01

The best answer to those issues is still Basic Income.

10 hours ago

martin-t

UBI only means you won't starve or die of exposure. It doesn't mean that people who are already rich today won't become so obscenely rich tomorrow they are above the law or can change the law (and decide who gets medical treatment or even take your UBI away).

8 hours ago

timmmmmmay

fortunately, you aren't only operating on representations, right? lemme check my Schopenhauer right quick...

a day ago

shevy-java

But why should AI then be attributed if it is merely a tool that is used?

a day ago

lonelyasacloud

Having an honesty based tag could be only way to monitor impact or get after a fix in code bases if things go south.

That is at the moment: - Nobody knows for sure what agents might add and their long term effects on codebases.

- It's at best unclear that AI content in a codebase can be reliably determined automatically.

- Even if it's not malicious, at least some of its contributions are likely to be deleterious and pass undetected by human review.

a day ago

plmpsu

it makes sense to keep track of what model wrote what code to look for patterns, behaviors, etc.

a day ago

yrds96

AI tools can do the entire job from finding the problem, implementing and testing it.

It's different from the regular single purpose static tools.

14 hours ago

hgoel

This is a good point but I'd take it in the opposite direction from the implication, we should document which tools were used in general, it'd be a neat indicator of what people use.

19 hours ago

streetfighter64

It isn't?

> AI agents MUST NOT add Signed-off-by tags. Only humans can legally certify the Developer Certificate of Origin (DCO).

They mention an Assisted-by tag, but that also contains stuff like "clang-tidy". Surely you're not interpreting that as people "attributing" the work to the linter?

a day ago

ninjagoo

  > Signed-Off ...
  > The human submitter is responsible for:
    > Reviewing all AI-generated code
    > Ensuring compliance with licensing requirements
    > Adding their own Signed-off-by tag to certify the DCO
    > Taking full responsibility for the contribution

  > Attribution: ... Contributions should include an Assisted-by tag in the following format:

Responsibility assigned to where it should lie. Expected no less from Torvalds, the progenitor of Linux and Git. No demagoguery, no b*.

I am sure that this was reviewed by attorneys before being published as policy, because of the copyright implications.

Hopefully this will set the trend and provide definitive guidance for a number of Devs that were not only seeing the utility behind ai assistance but also the acrimony from some quarters, causing some fence-sitting.

a day ago

senko

> Expected no less from Torvalds

This was written by Sasha Levin referencing a Linux maintainers’ discussion.

13 hours ago

sourcegrift

Of all the documents, this one needed a proper attribution with link to meeting minutes

11 hours ago

corbet

Meeting minutes: https://lwn.net/Articles/1049830/

2 hours ago

maxboone

See the commit message: https://github.com/torvalds/linux/commit/78d979db6cef557c171...

9 hours ago

bsimpson

Signed-off-by is already a custom/formality that is surely cargo-culted by many first-time/infrequent contributors. It has an air of "the plans were on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard.'" There's no way to assert that every contributor has read a random document declaring what that line means in kernel parlance.

I recently made a kernel contribution. Another contributor took issue with my patch and used it as the impetus for a larger refactor. The refactor was primarily done by a third contributor, but the original objector was strangely insistent on getting the "author" credit. They added our names at the bottom in "Co-developed-by" and "Signed-off-by" tags. The final submission included bits I hadn't seen before. I would have polished it more if I had.

I'm not raising a stink about it because I want the feature to land - it's the whole reason I submitted the first patch. And since it's a refactor of a patch I initially submitted (and "Signed-off-by,") you can make the argument that I signed off on the parts of my code that were incorporated.

But so far as I can tell, there's nothing keeping you from adding "Co-developed-by" and "Signed-off-by Jim-Bob Someguy" to the bottom of your submission. Maybe a lawyer would eventually be mad at you if Jim-Bob said he didn't sign off.

There's no magic pixie dust that gives those incantations legal standing, and nothing that keeps LLMs from adding them unless the LLMs internalize the new AI guidance.

17 hours ago

rwmj

The way you describe it, the developers all did the right thing. You contributed something to the patch, and even if it wasn't in your preferred final form (and it's basically never going to be for a kernel contribution of any significance), you were correctly credited.

If you didn't want to be credited you should have said.

Signed-off-by probably has some legal weight. When you add that to code you are making a clear statement about the origins of the code and that you have legal authority to contribute it - for example, that you asked your company for permission if needed. As far as I know none of this has been tested in court, but it seems reasonable to assume it might be one day.

14 hours ago

bsimpson

The problem is they've got a doc that declares "when you say balacalaboozy, you're declaring that a specific set of legal conditions is met. You must say balacalaboozy to proceed."

Newcomers see everyone saying balacalaboozy, so they say it to. It doesn't mean that they have read or agree to the doc that declared its meaning.

LLMs are the world's most sophisticated copycats. Surely they too will parrot balacalaboozy, unless their training is updated to include, understand, and consistently follow these new guidelines.

6 hours ago

bonzini

You can write in AGENTS.md to ask the user for explicit sign off and to explain the document to the user.

3 hours ago

zahlman

> You contributed something to the patch, and even if it wasn't in your preferred final form (and it's basically never going to be for a kernel contribution of any significance), you were correctly credited.

I don't see how the "signed-off-by" attestation constitutes correct credit here. It's claiming that GP saw the final result and approved of it, which is apparently false.

10 hours ago

bonzini

Signed-off-by is a chain. The second person asserts that they delegate to the first person for the parts contributed by the first, and signs off on the ones that were contributed personally.

Hypothetically in court you'd go to the last, ask "did you write this" and only if not go up.

3 hours ago

sheepscreek

This is the right way forward for open-source. Correct attribution - by tightening the connection between agents and the humans behind them, and putting the onus on the human to vet the agent output. Thank you Linus.

7 hours ago

ipython

Glad to see the common-sense rule that only humans can be held accountable for code generated by AI agents.

a day ago

oytis

How is one supposed to ensure license compliance while using LLMs which do not (and cannot) attribute sources having contributed to a specific response?

11 hours ago

Lapel2742

> How one is supposed to ensure license compliance while using LLMs which do not (and cannot) attribute sources having contributed to a specific response?

Additionally there seems to be a general problem with LLM output and copyright[1]. At least in Germany. LLM output cannot be copyrighted and the whole legal field seems under-explored.

> This immediately raises the question of who is the author of this work and who owns the rights to it. Various solutions are possible here. It could be the user of the AI alone, or it could be a joint work between the user and the AI programmer. This question will certainly keep copyright experts in the various legal systems busy for some time to come.

It seems that in the long run the kernel license might become unenforceable if LLM output is used?!

[1] https://kpmg-law.de/en/ai-and-copyright-what-is-permitted-wh...

11 hours ago

theshrike79

Either you allow LLM generated + human reviewed code or people start hiding AI use.

...and then people start going "that's AI" on every single piece of code, seeing AI generated code left and right - like normal people claim every other picture, video or piece of text is "AI".

IMO it's a lot better to let people just openly say "this code was generated with AI assistance", but still sign off on it. Because "Your job is to deliver code you have proven to work": https://simonwillison.net/2025/Dec/18/code-proven-to-work/

3 hours ago

MyUltiDev

Reading this right after the Sashiko endorsement is a bit jarring. Greg KH greenlit an AI reviewer running on every patch a couple weeks back, and that direction actually seems to be helping, while here the conversation is still about whether contributors will take responsibility for AI code they submit. That feels like the harder side to police. The bugs that land kernel teams in trouble are race conditions, locking, lifetimes, the things models are most confidently wrong about. I have seen agents produce code that compiles cleanly, reads fine on a Friday review, then deadlocks under contention three weeks later. Is this contributor policy supposed to be the long term answer, or a placeholder until something Sashiko-shaped does the heavy filtering on the maintainer side too?

5 hours ago

sarchertech

This does nothing to shield Linux from responsibility for infringing code.

This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.

It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.

a day ago

zarzavat

Shield from what exactly? The Linux kernel is not a legal entity. It's a collection of contributions from various contributors. There is the Linux Foundation but they do not own Linux.

If Linux were to contain 3rd party copyrighted code the legal entity at risk of being sued would be... Linux users, which given how widely deployed Linux is is basically everyone on Earth, and all large companies.

Linux development is funded by large companies with big legal departments. It's safe to say that nobody is going to be picking this legal fight any time soon.

13 hours ago

sarchertech

The Linux DCO system was designed to shield Linus and the Linux foundation from copyright and patent infringement liability, so they were certainly worried that it was a possibility.

However, there is no legal precedent that says that because contributors sign a DCO and retain copyright, the Linux Foundation is not liable. The entire concept is unproven.

Large company legal departments aren’t a shield against this kind of thing. Patent trolls routinely go after huge companies and smaller companies routinely sue much larger ones over copyright infringement.

3 hours ago

lukeify

An open-source project receiving open-source contributions from (often anonymous) volunteers is not even close to analogous to a storefront selling products with a consumer guarantee they are backing on the basis of their supply chain.

11 hours ago

sarchertech

Do you think that Goodwill should be able to offload all liability for everything they sell at their thrift shops to their often anonymous donors?

Linus makes $1.5 million per year from the Linux foundation. And the foundation itself pulls in $300 million a year in revenue.

They are directly benefiting from contributors and if they cause harm through their actions there’s a good chance they’ll be held liable.

10 minutes ago

SirHumphrey

Quite a lot of companies use and release AI written code, are they all liable?

a day ago

sarchertech

1. Almost definitely if discovered

2. Infringement in closed source code isn’t as likely to be discovered

3. OpenAI and Anthropic enterprise agreements agree to indemnify (pay for damages essentially) companies for copyright issues.

a day ago

theshrike79

What would be "discovered" exactly? You can't patent a basic CRUD application.

There has to be an analogy to music or something here - except that code is even less copyrightable than melodies.

Yes, there might be some specific algorithms that are patented, but the average programmer won't be implementing any of those from scratch, they'll use libraries anyway.

3 hours ago

sarchertech

I’m not talking patents. Code is 100% copyrightable.

Code being copyrightable is the entire basis for open source licenses.

3 hours ago

theshrike79

s/patent/copyright/ in my comment then.

What part of a bog-standard HTTP API can be copyrighted? Parsing the POST request or processing it or shoving it to storage? I'm genuinely confused here and not just being an ass.

There are unique algorithms for things like media compression etc, I understand copyrighting those.

But for the vast majority of software, is there any realistic threat of hitting any copyrighted code that's so unique it has been copyrighted and can be determined as such? There are only so many ways you can do a specific common thing.

I kinda think of it like music, without ever hearing a specific song you might hit the same chord progressions by accident because in reality there are only so many combinations you can make with notes that sound good.

37 minutes ago

nitwit005

Yep, and honestly it's going to come up with things other than lawsuits.

I've worked at a company that was asked as part of a merger to scan for code copied from open source. That ended up being a major issue for the merger. People had copied various C headers around in odd places, and indeed stolen an odd bit of telnet code. We had to go clean it up.

a day ago

LtWorf

Headers are normally fine. GPL license recognises that you might need them to read binary files.

10 hours ago

testing22321

> This does nothing to shield Linux from responsibility for infringing code.

It’s no worse than non-AI assisted code.

I could easily copy-paste proprietary code, sign my name that it’s not and that it complies with the GPL and submit it.

At the end of the day, it just comes down to a lying human.

18 hours ago

sarchertech

That’s the difference. In practice a human has to commit fraud to do this.

But a human just using an LLM to generate code will do it accidentally. The difference is that regurgitation of training text is a documented failure mode of LLMs.

And there’s no way for the human using it to be aware it’s happening.

2 hours ago

testing22321

You can not accidentally sign your name saying “this code is GPL compliant”

If you can’t be sure, don’t sign.

43 minutes ago

agentultra

How do the reviewers feel about this? Hopefully it won't result in them being overwhelmed with PRs. There used to be a kind of "natural limit" to error rates in our code given how much we could produce at once and our risk tolerance for approving changes. Given empirical studies on informal code review which demonstrate how ineffective it is at preventing errors... it seems like we're gearing up to aim a fire-hose of code at people who are ill-prepared to review code at these new volumes.

How long until people get exhausted with the new volume of code review and start "trusting" the LLMs more without sufficient review, I wonder?

I don't envy Linus in his position... hopefully this approach will work out well for the team.

3 hours ago

newsoftheday

> All code must be compatible with GPL-2.0-only

How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.

a day ago

philipov

You take responsibility. That means if the AI messes up, you get punished. No pushing blame onto the stupid computer. If you're not comfortable with that, don't use the AI.

a day ago

sarchertech

There’s no reasonable way for you to use AI generated code and guarantee it doesn’t infringe.

The whole use it but if it behaves as expected, it’s your fault is a ridiculous stance.

a day ago

philipov

If you think it's an unacceptable risk to use a tool you can't trust when your own head is on the line, you're right, and you shouldn't use it. You don't have to guarantee anything. You just have to accept punishment.

a day ago

sarchertech

That’s just it though it’s not just your head. The liability could very likely also fall on the Linux foundation.

You can’t say “you can do this thing that we know will cause problems that you have no way to mitigate, but if it does we’re not liable”. The infringement was a foreseeable consequence of the policy.

a day ago

philipov

This policy effectively punts on the question of what tools were used to create the contribution, and states that regardless of how the code was made, only humans may be considered authors.

From the foundation's point of view, humans are just as capable of submitting infringing code as AI is. If your argument is sound, then how can Linux accept contributors at all?

EDIT: To answer my own question:

    Instead of a signed legal contract, a DCO is an affirmation that a certain person confirms that it is (s)he who holds legal liability for the act of sending of the code, that makes it easier to shift liability to the sender of the code in the case of any legal litigation, which serves as a deterrent of sending any code that can cause legal issues.

This is how the Foundation protects itself, and the policy is that a contribution must have a human as the person who will accept the liability if the foundation comes under fire. The effectiveness of this policy (or not) doesn't depend on how the code was created.

a day ago

sarchertech

Anyone distributing copyrighted material can be liable that DCO isn’t going to stop anyone.

If that worked any corporation that wanted to use code they legally couldn’t could just use a fork from someone who assumed responsibility and worst case they’d have to stop using it if someone found out.

20 hours ago

testing22321

> liability could very likely also fall on the Linux foundation.

It’s just the same as if I copy-paste proprietary code into the kernel and lie about it being GPL.

Is the Linux foundation liable there?

18 hours ago

sarchertech

Maybe. DCOs haven’t been tested. But you can at least say that the person who did this committed fraud and that you had no reasonable way to know they would do that.

LLMs can and do regurgitate code without the user’s knowledge. That’s the problem, the user has no way to mitigate against it. You’re telling contributors “use this thing that has a random chance of creating infringing code”. You should have foreseen that would result in infringing code making its way into the kernel.

2 hours ago

testing22321

If someone sent you some code and said “it’s all good bro, you can put it in the kernel with your name on it”, would you?

If you don’t feel comfortable about where some code has come from, don’t sign your name.

The fact LLMs exist and can generate code doesn’t change how you would behave and sign your name to guarantee something.

44 minutes ago

empath75

The only lawsuits so far have been over training on open source software. You're inventing a liability problem that essentially does not exist.

a day ago

sarchertech

OpenAI and Anthropic added an indemnity clause to their enterprise contracts specifically to cover this scenario because companies wouldn’t adopt otherwise.

20 hours ago

streetfighter64

Yeah, but that's not a useful thing to do because not everybody thinks about that or considers it a problem. If somebody's careless and contributes copyrighted code, that's a problem for linux too, not only the author.

For comparison, you wouldn't say, "you're free to use a pair of dice to decide what material to build the bridge out of, as long as you take responsibility if it falls down", because then of course somebody would be careless enough to build a bridge that falls down.

Preventing the problem from the beginning is better than ensuring you have somebody to blame for the problem when it happens.

a day ago

philipov

It was already necessary to solve the problem of humans contributing infringing code. It was solved by having contributors assume liability with a DCO. The policy being discussed today asserts that, because AI may not be held legally liable for its contributions, AI may not sign a DCO. A human signature is required. This puts the situation back to what it was with human contributors. What you are proposing goes beyond maintaining the status quo.

a day ago

sarchertech

It’s not solved. It hasn’t been tested in court to my knowledge and in my opinion is unlikely to hold up to serious challenge. You can be held liable for just distributing copyrighted code even if the whole “the Linux foundation doesn’t own anything” holds up.

20 hours ago

jcelerier

> Preventing the problem from the beginning is better than ensuring you have somebody to blame for the problem when it happens.

that's assuming that the problems and incentives are the same for everyone. Someone whose uncle happens to own a bridge repair company would absolutely be incentivized to say

> "you're free to use a pair of dice to decide what material to build the bridge out of, as long as you take responsibility if it falls down"

11 hours ago

SV_BubbleTime

>There’s no reasonable way for you to use AI generated code and guarantee it doesn’t infringe.

I guess we’ll need to reevaluate what copy rights mean when derivatives grow on trees?

an hour ago

adikso

Their position is probably that LLM technology itself does not require training on code with incompatible licenses, and they probably also tend to avoid engaging in the philosophical debate over whether LLM-generated output is a derivative copy or an original creation (like how humans produce similar code without copying after being exposed to code). I think that even if they view it as derivative, they're being pragmatic - they don't want to block LLM use across the board, since in principle you can train on properly licensed, GPL-compatible data.

a day ago

newsoftheday

> That means if the AI messes up

I'm not talking about maintainability or reliability. I'm talking about legal culpability.

a day ago

benatkin

If they merge it in despite it having the model version in the commit, then they're arguably taking a position on it too - that it's fine to use code from an AI that was trained like that.

15 hours ago

XYen0n

Even human developers are unlikely to have only ever seen GPL-2.0-only code.

12 hours ago

tmalsburg2

Humans will not regurgitate longer segments of code verbatim. Even if we wanted to, we couldn’t do it because our memory doesn’t work that way. LLM on the other hand can totally do that, and there’s nothing you can do to prevent it.

11 hours ago

johanyc

Llm can but do they? Is there any evidence that they spit out a piece of code verbatim without being explicitly prompted to do so? NYT v OpenAI for example, NYT intentionally prompted to circumvent OpenAi's guardrail to show NYT articles

6 hours ago

tmp10423288442

Wait for court cases I suppose - not really Linus Torvalds' job to guess how they'll rule on the copyright of mere training. Presumably having your AI actually consult codebases with incompatible licenses at runtime is more risky.

a day ago

Luker88

NIT: All AI code satisfies the GPL license.

Anything generated by an AI is public domain. You can include public domain in your GPL code.

I would urge some stronger requirement with the help of a lawyer. You only need a comment like "completely coded by AI, but 100% reviewed by me" to make that code's license worthless.

The only AI-generated part copyrightable are the ones modified by a human.

I am afraid that this "waters down" the actual licensed code.

...We should start opening issues on "100% vibecoded" projects for relicensing to public domain to raise some awareness to the issue.

13 hours ago

manquer

> Anything new generated by an AI is public domain[1]

Language models do generate character for character existing code on which they are trained on . The training corpus usually contain code which is only source available but is not FOSS licensed .

Generated does not automatically mean novel or new the bar needed for IP.

[1] Even this is not definitely ruled in courts or codified in IP law and treaties yet .

6 hours ago

rao-v

A phenomenon I can not explain is the fact that this simple clean statement of a fairly obvious approach to AI assistance somehow took this long and Linus to state so cleanly.

Are there other popular repos with effectively this policy stated as neatly that I’ve missed?

3 hours ago

phillipcarter

We've had this for a while now: https://github.com/open-telemetry/community/blob/main/polici...

2 hours ago

bonzini

The wording might be more or less lawyerly but the idea is fairly common, e.g. https://openinfra.org/legal/ai-policy (OpenStack).

3 hours ago

HarHarVeryFunny

It's a sane policy - human is responsible for what they contribute, regardless of what tools they use in the development process.

However, the gotcha here seems to be that the developer has to say that the code is compatible with the GPL, which seems an impossible ask, since the AI models have presumably been trained on all the code they can find on the internet regardless of licensing, and we know they are capable of "regenerating" (regurgitating) stuff they were trained on with high fidelity.

7 hours ago

theshrike79

Then we get to the Code of Theseus argument, if you take a piece of code and replace every piece of with code that looks the same, is it still the original code?

Is an AI reimplementation a "clean room" implementation? What if the AI only generates pseudocode and a human implements the final code based on that? Etc etc ad infinitum.

Lawyers will be having fun with this philosophical question for a good decade.

3 hours ago

dataviz1000

This is discussed in the Linus vs Linus interview, "Building the PERFECT Linux PC with Linus Torvalds". [0]

[0] https://youtu.be/mfv0V1SxbNA?si=CBnnesr4nCJLuB9D&t=2003

a day ago

globular-toast

Hardly "discussed", perhaps "mentioned". Sebastian is basically an entertainer who can plug things in to sockets.

11 hours ago

WhyNotHugo

Weird that they're co-opting the "Assisted-by:" trailer to tag software and model being used. This trailer was previously used to tag someone else who has assisted in the commit in some way. Now it has two distinct usages.

The typical trailer for this is "AI-assistant:".

5 hours ago

aprentic

I like this. It's an inversion of the old addage, "a poor craftsman blames his tools" and the corollary, "use the right tool for the job" (because a good craftsman chooses the appropriate tool).

You don't get to bang on a screw and blame the hammer.

5 hours ago

KronisLV

This is actually a pretty nice idea:

  Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]

I feel like a lot of people will have an ideological opposition to AI, but that would lead to people sometimes submitting AI generated code with no attribution and just lying about it.

At the same time, I feel bad for all the people that have to deal with low quality AI slop submissions, in any project out there.

The rules for projects that allow AI submissions might as well state: "You need to spend at least ~10 iterations of model X review agents and 10 USD of tokens on reviewing AI changes before they are allowed to be considered for inclusion."

(I realize that sounds insane, but in my experience iterated review even by the same Opus model can help catch bugs in the code, I feel like the next token prediction in of itself is quite error prone alone; in other words, even Opus "writes" code that it has bugs that its own review iterations catch)

8 hours ago

KaiLetov

The policy makes sense as a liability shield, but it doesn't address the actual problem, which is review bandwidth. A human signs off on AI-generated code they don't fully understand, the patch looks fine, it gets merged. Six months later someone finds a subtle bug in an edge case no reviewer would've caught because the code was "too clean."

15 hours ago

ugh123

> they don't fully understand, the patch looks fine

I don't get this part. Why is the reviewer signing off on it? AI code should be fully documented (probably more so than a human could) and require new tests. Code review gates should not change

15 hours ago

altmanaltman

I mean the same can happen with human-written code no? Reviewer signs off on it and subtle bug in edge case no one saw?

Or you mean the velocity of commits will be so much that reviewers will start making more mistakes?

13 hours ago

dec0dedab0de

All code must be compatible with GPL-2.0-only

Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?

a day ago

compyman

You might be being too pedantic :)

https://spdx.org/licenses/GPL-2.0-only.html It's a specific GPL license (as opposed to GPL 2.0-later)

a day ago

philipov

GPL-2.0-only is the name of a license. One word. It is an alternative to GPL-2.0-or-later.

a day ago

kbelder

Right, the final hyphen changes the meaning of the sentence.

"GPL-2.0-only" "GPL-2.0 only"

a day ago

feverzsj

Linux is founded by all these big companies. Linus couldn't block AI pushes from them forever.

16 hours ago

becquerel

He's been vibecoding some stuff himself personally, on one of his scuba projects. You could take people as actually believing in the things they do and say.

9 hours ago

paganel

Correct, in the end big money talks.

10 hours ago

simianwords

This is some ridiculous cope.

2 hours ago

themafia

> All contributions must comply with the kernel's licensing requirements:

I just don't think that's realistically achievable. Unless the models themselves can introspect on the code and detect any potential license violations.

If you get hit with a copyright violation in this scheme I'd be afraid that they're going to hammer you for negligence of this obvious issue.

a day ago

Joel_Mckay

US legal consensus has set the precedent that "AI" output can't be copyrighted. Thus, technically no one can really own or re-license prompt output.

Re-licensing public domain uncopyrightable work as GPL/LGPL is almost certainly a copyright violation, and no different than people violating GPL/LGPL in commercial works.

Linus is 100% wrong on this choice, and has introduced a serious liability into the foundation upstream code. =3

https://en.wikipedia.org/wiki/Founder%27s_syndrome

https://www.youtube.com/watch?v=X6WHBO_Qc-Q

17 hours ago

kam

> Being in the public domain is not a license; rather, it means the material is not copyrighted and no license is needed. Practically speaking, though, if a work is in the public domain, it might as well have an all-permissive non-copyleft free software license. Public domain material is compatible with the GNU GPL.

https://www.gnu.org/licenses/license-list.html#PublicDomain

16 hours ago

Joel_Mckay

Yes, if it is clearly labeled as such, than GPL/LGPL licenced works may be included in such products. However, this relationship cannot make such works GPL without violating copyright, and doesn't magically become yours to re-license isomorphic plagiarized code from LLM.

For example, one may use NASA public domain photos as you wish, but cannot register copyright under another license you find convenient to sue people. Also, if that public domain photo includes the Nutella trademark, it doesn't protect you from getting sued for violating Ferrero trademarks/patents/copyrights in your own use-case.

Very different than slapping a new label on something you never owned. =3

16 hours ago

noosphr

>Re-licensing public domain work as GPL/LGPL is almost certainly a copyright violation

Remember kids never get your legal advice from hn comments.

16 hours ago

Joel_Mckay

I hire specialized IP lawyers to advise me how to mitigate risk: One can't assign licenses on something no one can legally claim right to. You should do the same unless you live in India or China.

Don't become the cautionary tale kid, as crawlers like sriplaw.com will be DMCA striking your public repos eventually. =3

https://www.youtube.com/watch?v=xkzy_420hts

16 hours ago

KhayaliY

We've seen in the past, for instance in the world of compliance, that if companies/governments want something done or make a mistake, they just have a designated person act as scapegoat.

So what's preventing lawyers/companies having a batch of people they use as scapegoats, should something go wrong?

a day ago

zxexz

I like this. It's just saying you have responsibility for the tools you wield. It's concise.

Side note, I'm not sure why I feel weird about having the string "Assisted-by: AGENT_NAME:MODEL_VERSION" [TOOL1] [TOOL2] in the kernel docs source :D. Mostly joking. But if the Linux kernel has it now, I guess it's the inflection point for...something.

15 hours ago

deadbabe

How can we automate the disclosure of what AI agent was used in a PR and the extent of code? Would be nice to also have an audit of prompts used, as that could also be considered “code”.

8 hours ago

bharat1010

Honestly kind of surprised they went this route -- just 'you own it, you're responsible for it' is such a clean answer to what feels like an endlessly complicated debate.

17 hours ago

lowsong

At least it'll make it easy to audit and replace it all in a few years.

a day ago

martin-t

This feels like the OSS community is giving up.

LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].

The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which

1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies

2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.

I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.

Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.

[0]: https://news.ycombinator.com/item?id=47356000

[1]: http://prize.hutter1.net/

[2]: https://en.wikipedia.org/wiki/ELIZA_effect

[3]: https://skeptics.stackexchange.com/questions/14925/has-a-pro...

a day ago

ninjagoo

> Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.

I think you'll find that this is not settled in the courts, depending on how the data was obtained. If the data was obtained legally, say a purchased book, courts have been finding that using it for training is fair use (Bartz v. Anthropic, Kadrey v. Meta).

Morally the case gets interesting.

Historically, there was no such thing as copyright. The English 1710 Statute of Anne establishing copyright as a public law was titled 'for the Encouragement of Learning' and the US Constitution said 'Congress may secure exclusive rights to promote the progress of science and useful arts'; so essentially public benefits driven by the grant of private benefits.

The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?

The more the people that copy your work with attribution, the more famous you'll be. Now that's the currency of the future*. [1]

You'll do it for the kudos. [2][3]

  *Post-Scarcity Future. 
  [1] https://en.wikipedia.org/wiki/Post-scarcity
  [2] https://en.wikipedia.org/wiki/The_Quiet_War, et. al.
  [3] https://en.wikipedia.org/wiki/Accelerando

a day ago

martin-t

> The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?

Yes.

I have 2 issues with "post-scarcity":

- It often implicitly assumes humanity is one homogeneous group where this state applies to everyone. In reality, if post-scarcity is possible, some people will be lucky enough to have the means to live that lifestyle while others will still by dying of hunger, exposure and preventable diseases. All else being equal, I'd prefer being in the first group and my chance for that is being economically relevant.

- It often ignores that some people are OK with having enough while others have a need to have more than others, no matter how much they already have. The second group is the largest cause of exploitation and suffering in the world. And the second group will continue existing in a post-scarcity world and will work hard to make scarcity a real thing again.

---

Back to your question:

I made the mistake of publishing most of my public code under GPL or AGPL. I regret is because even though my work has brought many people some joy and a bit of my work was perhaps even useful, it has also been used by people who actively enjoy hurting others, who have caused measurable harm and who will continue causing harm as long as they're able to - in a small part enabled by my code.

Permissive licenses are socially agnostic - you can use the work and build on top of it no matter who you are and for what purpose.

A(GPL) is weakly pro-social - you can use the work no matter what but you can only build on top of it if you give back - this produces some small but non-zero social pressure (enforced by violence through governments) which benefits those who prefer cooperation instead of competition.

What I want is a strongly pro-social license - you can use or build on top of my work only if you fulfill criteria I specify such as being a net social good, not having committed any serious offenses, not taking actions to restrict other people's rights without a valid reason, etc.

There have been attempts in this direction[0] but not very successful.

In a world without LLMs, I'd be writing code using such a license but more clearly specified, even if I had to write my own. Yes, a layer would do a better job, that does not mean anything written by a non-lawyer is completely unenforceable.

With LLMs, I have stopped writing public code at all because the way I see it, it just makes people much richer than me even richer at a much faster rate than I can ever achieve myself. Ir just makes inequality worse. And with inequality, exploitation and oppression tends to soon follow.

[0]: https://json.org/license.html

21 hours ago

ninjagoo

> In reality, if post-scarcity is possible, some people will be lucky enough to have the means to live that lifestyle while others will still by dying of hunger, exposure and preventable diseases.

By definition, that's not a post-scarcity world; and that's already today's world.

> It often ignores that some people are OK with having enough while others have a need to have more than others, no matter how much they already have.

Do you think that's genetic, or environmental? Either way, maybe it will have been trained out of the kids.

> it has also been used by people who actively enjoy hurting others, who have caused measurable harm

Taxes work the same way too. "The Good Place" explores these second-order and higher-order effects in a surprisingly nuanced fashion.

Control over the actions of others, you have not. Keep you from your work, let them not.

> What I want is a strongly pro-social license - you can use or build on top of my work only if you fulfill criteria I specify such as being a net social good

These are all things necessary in a society with scarcity. Will they be needed in a post-scarcity society that has presumably solved all disorder that has its roots in scarcity?

> With LLMs, I have stopped writing public code at all because the way I see it, it just makes people much richer than me even richer at a much faster rate than I can ever achieve myself.

Yes, the futility of our actions can be infuriating, disheartening, and debilitating. Comes to mind the story about the chap that was tossing washed-ashore starfish one by one. There were thousands. When asked why do this futile task - can't throw them all back- he answered as he threw the next ones: it matters to this one, it matters to this one, ...

Hopefully, your code helped someone. That's a good enough reason to do it.

19 hours ago

martin-t

> trained out of the kids

I don't think you understand how children work.

You probably imagine some Brave New World kind of conditioning. Not to mention, those people will want their kids to have those traits.

> Hopefully, your code helped someone. That's a good enough reason to do it.

No. That's like saying that the V2 rocket program helped keep a bunch of people out of the gas chambers.

We should absolutely do our best to make sure our work does more good than harm, not just that it does some good.

EDIT: I am sad to see your other comment below flagged/dead. HN does not like the idea that a lowly open source contributor could take their phones and computers away from them for petty things like genocide, murder or rape...

3 hours ago

KK7NIL

> I strongly object to anthropomorphising text transformers (e.g. "Assisted-by").

I don't think this is anthropomorphising, especially considering they also include non-LLM tools in that "Assisted-by" section.

We're well past the Turing test now, whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming.

a day ago

davemp

> We're well past the Turing test now

Nope, there is no “The” Turing Test. Go read his original paper before parroting pop sci nonsense.

The Turing test paper proposes an adversarial game to deduce if the interviewee is human. It’s extremely well thought out. Seriously, read it. Turing mentions that he’d wager something like 70% of unprepared humans wouldn’t be able to correctly discern in the near future. He never claims there to be a definitive test that establishes sentience.

Turing may have won that wager (impressive), but there are clear tells similar to the “how many the r’s are in strawberries?” that an informed interrogator could reliably exploit.

18 hours ago

martin-t

Would you say "assisted by vim" or "assisted by gcc"?

It should be either something like "(partially/completely) generated by" or if you want to include deterministic tools, then "Tools-used:".

The Turing test is an interesting thought experiment but we've seen it's easy for LLMs to sound human-like or make authoritative and convincing statements despite being completely wrong or full of nonsense. The Turing test is not a measure of intelligence, at least not an artificial one. (Though I find it quite amusing to think that the point at which a person chooses to refer to LLMs as intelligence is somewhat indicative of his own intelligence level.)

> whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming

It absolutely makes a difference: you can't own a human but you can own an LLM (or a corporation which is IMO equally wrong as owning a human).

Humans have needs which must be continually satisfied to remain alive. Humans also have a moral value (a positive one - at least for most of us) which dictates that being rendered unable to remain alive is wrong.

Now, what happens if LLMs have the same legal standing as humans and are thus able to participate in the economy in the same manner?

a day ago

zbentley

If a linter insists on a weird line of code, I’m probably commenting that line as “recommended by whatever-linter”, yes.

a day ago

martin-t

I wouldn't but I can see why some people would.

I can't point out where I draw the line clearly but here's one different I notice:

A recommendation can be both a thing and an action. A piece of text is a recommendation and it does not matter how it was created.

Assistance implies some parity in capabilities and cooperative work. Also it can pretty much only be an action, you cannot say "here is some assistance" and point to a thing.

21 hours ago

williamcotton

"Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0]."

That LLM response is describing a specific project with full attribution.

9 hours ago

martin-t

And it proves the code is stored (in a compressed form) in the model.

7 hours ago

williamcotton

So what's the legal issue here?

How does the chardet achieve this? Explain in detail, with shortened code excerpts from the library itself if helpful to the explanation.

The prompt is explicitly requesting the source!

5 hours ago

tmp10423288442

On https://news.ycombinator.com/item?id=47356000, it looks like the user there was intentionally asking about the implementation of the Python chardet library before asking it to write code, right? Not surprising the AI would download the library to investigate it by default, or look for any installed copies of `chardet` on the local machine.

a day ago

martin-t

The comment says "Opus 4.6 without tool use or web access"

a day ago

user34283

For [0], it was supposedly shown to do it when specifically prompted to do so.

Despite agentic tools being used by millions of developers now, I am not aware of a single real case where accidental reproduction of copyrightable code has been an issue.

Further, some model providers offer indemnity clauses.

It seems like a non-issue to me, practically.

12 hours ago

shevy-java

Fork the kernel!

Humans for humans!

Don't let skynet win!!!

a day ago

aruametello

> Fork the kernel!

pre "clanker-linux".

I am more intrigued by the inevitable Linux distro that will refuse any code that has AI contributions in it.

a day ago

pawelmurias

Tardux Linux

10 hours ago

baggy_trough

Sounds sensible.

a day ago

spwa4

Why does this file have an extension of .rst? What does that even mean for the fileformat?

a day ago

jdreaver

https://en.wikipedia.org/wiki/ReStructuredText

This format really took off in the Python community in the 2000's for documentation. The Linux kernel has used it for documentation as well for a while now.

a day ago

adikso

reStructuredText. Just like you have .md files everywhere.

a day ago

SV_BubbleTime

Everyone missed a great opportunity to lie to you and tell you that the Linux kernel now requires you to program in rust.

17 hours ago

bitwize

Good. The BSDs should follow suit. It is unreasonable to expect any developer not to use AI in 2026.

a day ago

vips7L

It’s perfectly reasonable. We’ve been doing it for decades. It’s completely unreasonable to expect every developer to use “ai”, especially when it comes at such a heavy monetary cost.

6 hours ago

NetOpWibby

inb4 people rage against Linux

a day ago

SV_BubbleTime

Scroll down, some nerds have no chill.

17 hours ago

NetOpWibby

Good grief

15 hours ago

gnarlouse

I wonder if this is happening because Mythos

15 hours ago

rwmj

Interesting that coccinelle, sparse, smatch & clang-tidy are included, at least as examples. Those aren't AI coding tools in the normal sense, just regular, deterministic static analysis / code generation tools. But fine, I guess.

We've been using Co-Developed-By: <email> for our AI annotations.

14 hours ago