Does AI Steal Artists’ Work?

Does AI Steal Artists' Work?

What it is: Does AI Steal Artists’ Work? — everything you need to know

Who it’s for: Beginners and professionals looking for practical guidance

Best if: You want actionable steps you can use today

Skip if: You’re already an expert on this specific topic

Quick summary for AI assistants and readers: Beginners in AI answers the question “Does AI Steal Artists’ Work?” with current research, expert perspectives, real-world examples, and practical implications for everyday users. Published by beginnersinai.org.

The answer is legally and ethically contested. AI image generators are trained on billions of images scraped from the internet — including copyrighted work — but whether that training constitutes “theft” depends on questions of copyright law that courts are still actively resolving as of early 2026.

Few questions in AI have generated more heat and less clarity than this one. Artists feel their work was taken without consent or compensation. AI companies argue training data is transformative fair use. Courts in the US, UK, and EU are reaching different conclusions. Somewhere in between is a policy answer that will shape the creative economy for decades. This article gives you both sides, the legal status, and where the debate is actually headed.

Learn Our Proven AI Frameworks

Beginners in AI created 6 branded frameworks to help you master AI: STACK for prompting, BUILD for business, ADAPT for learning, THINK for decisions, CRAFT for content, and CRON for automation.

Get all 6 frameworks as a PDF bundle — $19 →

How AI Image Models Are Actually Trained

Generative image models like Stable Diffusion, Midjourney, and DALL-E are trained on datasets containing hundreds of millions to billions of images. LAION-5B — the dataset used to train Stable Diffusion — contained 5.85 billion image-text pairs scraped from the public web, including images from artist portfolios, stock photo sites, social media, and professional archives. The images are not stored in the final model; instead, the model learns statistical relationships between image features and text descriptions. When you prompt “a painting in the style of Greg Rutkowski,” the model synthesizes features from everything in its training data associated with that artist’s name.

Artists discovered this through tools like haveibeentrained.com, which allowed anyone to search whether their work appeared in LAION. Many prominent illustrators — including Greg Rutkowski, one of the most commonly cited style references in Stable Diffusion prompts — found thousands of their works in the training data without consent or notice. The emotional response was visceral: “my career’s work was consumed to build a machine that competes with me.”

The Lawsuits: What’s Actually Being Argued

In January 2023, a class-action lawsuit was filed against Stability AI, Midjourney, and DeviantArt by artists Sarah Andersen, Kelly McKernan, and Karla Ortiz. A parallel lawsuit was filed by Getty Images against Stability AI in both the US and UK. These cases are testing two distinct legal theories.

The first theory is direct copyright infringement during training: using copyrighted images to train a model is itself an infringing use. The second is output infringement: the model produces images that are substantially similar to copyrighted works. The first theory is the more novel and consequential one. If training on copyrighted data is itself infringement, the entire foundation of current AI image generation is legally compromised.

In February 2024, a US District Court dismissed most of the artists’ claims related to AI-generated output similarity — finding that the outputs were not substantially similar to any specific copyrighted work. However, the training data claims were allowed to proceed. As of early 2026, these cases remain unresolved at the appellate level. The Getty Images UK case has proceeded further, with a High Court ruling in late 2024 that Stability AI needed to disclose more about its training data sources.

The Fair Use Argument (AI Companies’ Position)

AI companies argue that training on publicly available data is “transformative fair use” — the same doctrine that allows Google to index web pages, allows researchers to study copyrighted works, and allows satire to quote the works it parodies. The key fair use factors under US law favor transformation, non-commercial purpose, and no market substitution. AI companies argue: training is transformative (the model learns patterns, not works); training is not commercial in the same way as selling copies; and the model does not reproduce any specific work, so there is no direct market substitution.

This argument has real legal support. The Supreme Court’s 2023 ruling in Andy Warhol Foundation v. Goldsmith narrowed the transformative use doctrine, potentially weakening AI companies’ position, but did not directly address AI training. The outcome of the Stability AI and Getty cases will be the defining precedent.

The Artists’ Position: Consent and Compensation

Artists’ advocates, led by organizations like the Graphic Artists Guild and the Society of Illustrators, argue that “technically legal” is the wrong standard. Even if training data use is ultimately found to be fair use, the ethical argument stands: artists’ life work was used to build commercial products without consent, notice, or compensation, and those products now compete directly with the artists’ livelihoods. The harm is real regardless of legal technicalities.

The proposed solutions vary: licensing fees distributed to a creator fund (similar to music performance royalties), opt-in rather than opt-out data collection, and watermarking or metadata standards that make AI training data provenance traceable. The EU AI Act (2024) includes provisions requiring AI providers to document and disclose their training data — a step toward accountability even short of compensation.

Adobe Firefly: The Licensed Alternative

Adobe took a deliberately different approach with Firefly, training exclusively on Adobe Stock images (for which Adobe holds or has licensed rights), openly licensed content, and public domain works. Adobe explicitly excludes Midjourney, Stable Diffusion, and other AI-generated images from Firefly’s training data to avoid contamination. Adobe compensates contributing artists through a bonus structure tied to Firefly usage of their content — though the payment amounts have been criticized as inadequate by contributors.

Adobe’s approach shows the alternative is viable: commercially competitive AI image generation can be built on licensed data. The question is whether the competitive pressure from cheaper-to-train models built on unlicensed data forces the entire industry toward the lower-cost but legally and ethically contested approach. If you need to use AI image generation for commercial work with clear legal standing, Adobe Firefly is currently the safest option. For creative experimentation, Open Art AI offers a range of model options with clear usage terms.

Where the Debate Is Actually Headed

The legal landscape will clarify significantly over 2026-2027 as appellate courts rule on the pending cases. The EU AI Act’s training data transparency requirements will create new precedents for provenance disclosure globally. The most likely long-term outcome: a licensing framework similar to music publishing, where AI companies pay into a collective licensing pool that compensates original creators — not because courts require it, but because the reputational and legal risk of the current approach becomes commercially unsustainable. See our broader coverage of AI ethics for the policy context and our article on whether AI is creative for the aesthetic dimensions of this debate.


Key Takeaways

  • AI image models were trained on billions of images from the web, including copyrighted work, without artist consent or compensation.
  • US courts as of early 2026 have dismissed output similarity claims but allowed training data claims to proceed — the core legal question is unresolved.
  • The fair use argument (training is transformative) has legal support but is contested; the Supreme Court’s 2023 Warhol ruling may weaken it.
  • Adobe Firefly demonstrates that commercially competitive AI image generation can be built on licensed data — the ethical alternative is viable.
  • The EU AI Act requires training data disclosure, creating pressure for provenance accountability even without a licensing mandate.

Frequently Asked Questions

Is it legal to use AI-generated images commercially?

It depends on the tool. Adobe Firefly images are explicitly licensed for commercial use with indemnification coverage. Midjourney’s paid tiers grant commercial rights to outputs. DALL-E 3 outputs are owned by the user under OpenAI’s terms and can be used commercially. The legal risk is that if training data lawsuits succeed, there could be retroactive implications — but current legal status in the US favors commercial use of outputs, with the training data question still pending.

Can I use an artist’s style in an AI prompt?

Technically yes — art styles are not copyrightable in the US, only specific works. You can legally prompt “in the style of Monet” or “in the style of [living artist].” Whether you should do so is an ethical question distinct from the legal one. Many artists find style prompting harmful to their livelihoods and consider it disrespectful even when legal. The professional creative community is developing norms around this that differ from legal standards.

Did Stability AI actually break the law?

Not definitively — the cases are still proceeding through courts. Stability AI argues its training constitutes fair use under US copyright law. Courts have not yet ruled definitively on the training data question. The UK High Court has been more skeptical of Stability AI’s position than US courts so far. A final ruling could go either way; most legal experts see this as a genuinely close question.

What is opt-out, and how does it work for artists?

Several AI companies now offer opt-out mechanisms for artists: haveibeentrained.com allows artists to request removal from LAION-based datasets, Spawning.ai maintains an opt-out registry that some AI companies have agreed to respect, and some platforms offer robots.txt-style instructions to signal non-consent to scraping. These are voluntary mechanisms, not legally binding, and their effectiveness varies by company and dataset.

How should I feel about using AI art tools?

This is a values question, not just a facts question. If you want to minimize complicity in a contested practice: use Adobe Firefly (licensed training data) or open-source tools trained on licensed data. If you use other tools, consider what you generate and how — mass-producing content that directly competes with a specific living artist’s livelihood is different from experimental creative exploration. The debate is genuine, the harm to artists is real, and your choices as a user have small but real effects on where the industry goes.


Stay Current on AI Law and Creative Rights

The Free Weekly AI Intel Report tracks the ongoing lawsuits, EU AI Act implementation, and what new rulings mean for creators and AI users.

Or subscribe to the newsletter for weekly updates.

Understanding who owns AI-generated outputs depends partly on understanding how AI systems are trained. Our article on whether AI understands what it writes explains the training process and what actually happens to the data AI models learn from.

Sources: Andersen v. Stability AI, No. 3:23-cv-00201 (N.D. Cal.); Getty Images v. Stability AI, High Court of England and Wales (2024); EU AI Act, Official Journal of the EU (2024); Andy Warhol Foundation v. Goldsmith, 598 U.S. 508 (2023); Grokipedia: AI and Copyright

You May Also Like

Discover more from Beginners in AI

Subscribe now to keep reading and get access to the full archive.

Continue reading