The Creative Spark is Not Just Words on a Page
An Author-Lawyer's Take on the Anthropic Settlement
When I published Compendium, I was proudest not of the words themselves. Rather, it was the years of thinking, free writing, and fleshing out the story and characters that made those words possible. It was all of that private scaffolding that never appears on the page but lives on in my mind as the intellectual fruits of my labor. The very first spark hit me in an ordinary moment one day at the office. I was setting up a Kindle Fire device that a few of us had purchased for my then legal assistant’s birthday. As I clicked through menus and added her account, a blank device populated with knowledge and other people’s stories. It made me think about what it would feel like to pick up a book or a computer for the first time and realize the world’s knowledge is at one’s fingertips. That seed eventually became Compendium after two more years of free writing, world building, plotting, writing, revising, and editing.
That’s why the recent author class action against Anthropic has felt personal. It IS personal to me and to thousands of other authors. To us, this case isn’t only about text files used to train an LLM. This case is about whether the long arc of a writer’s labor, my labor, can legally be treated as free input for machines.
On September 5, 2025, Anthropic told a federal court it would pay $1.5 billion to settle the certified class action lawsuit brought by authors alleging their books were pulled from pirate libraries and used to build Claude. The settlement was quickly approved, and cue the headlines. They focus on the size of the settlement, of course. Yes, this is landmark for copyright lawsuits. But for working writers (like me), and for lawyers advising creative clients (also me), the shape of the settlement matters more than the number.
At the end of the day, what does this settlement actually mean?
Money for each book. The settlement creates a non-reversionary fund with payments allocated per qualifying book. It’s a really blunt, cold way of accounting for the collective years of sweat and tears (and maybe blood if you like to work in hard copy) that each and every author puts into a book, but it’s at least a concrete recognition that books are assets, not amorphous “content” readily available with no rights.
No “outputs” amnesty. Critically, the release is limited to past ingestion/retention/training conduct and does not release “output” claims (e.g., if a model later reproduces protected expression from a class work). As a lawyer, I read that as preserving a lane for future disputes over model behavior, rather than declaring truce on everything AI might emit tomorrow. It’s not a perfect result, because it declines to specifically protect authors as a class from the potential effects of the training that’s already occurred. It leaves us to fend for ourselves as AI companies grow in funding (particularly among the billionaire class that has no qualms about asserting their rights to behave however they choose because that’s what money means to them) and in scope.
Prospective hygiene. Anthropic must destroy the specific downloaded book files (from pirate sources) and confirm it, subject to preservation obligations. Even if you believe in broad fair use for training, this nods to a baseline: don’t warehouse pirate libraries. Really, at the end of the day, this is the minimum bar that needs to be met. Collecting data you don’t own the rights to and using it train models shouldn’t even be an issue, but this settlement unfortunately is a very narrow interpretation of appropriate retention and use. If I pay $8 for an ebook, should I be able to plug it into AI and train it? As an author, I would say… no, that’s not what the license you’ve been granted encompasses. I think at the end of the day, we’re going to have to see a very different licensing scheme for copyright and the licenses that are granted to readers who purchase a copy of a book. I don’t think it will be long before books are going to have express bans in using the contents of a book for training LLMs as part of every book’s copyright page.
Who’s in? If your book appears on the works list, you’re in the class. I have linked to the public lookup tool. There are deadlines for claims, objections, and opt-outs. Also, be aware that you need to have an actual federal copyright for your work. Luckily, I did this for Compendium, but I’ve been hearing horror stories from other authors who thought the publisher was the one handling this registration for them, and are not left high and dry. As authors, it is incumbent upon us to obtain our registered copyrights. If you want to know more about this, reach out. I’m happy to share my knowledge. Link to Anthropic Settlement Page.
The moment I realized Compendium was in the list of works
Earlier this year, when the preliminary public lookup tool went live, I typed in Compendium and held my breath. When it appeared on the list of pirated works that Anthropic had used, my heart sank. Part of me knew this was going to be the case before I even did the search. Compendium had been pirated from an advance reader copy I’d provided to potential reviewers almost from the day that the final ebook was ready. It was already bumbling around the internet before it was even released. Authors trade ARCs because we have to get early reviews, buzz, and blurbs. But the ARC is also the soft underbelly of a book’s life cycle. Seeing that pre-public text end up as grist confirmed a fear I’d carried since the leak. My years-long work was reduced to training fodder without consent.
Training vs. taking?
The court record in the Anthropic case reflects the unresolved tension many of us as both authors and lawyers feel. Some judges have signaled that training use could be fair in the abstract. They may have even drawn a line at building and keeping a central library of millions of pirated books, but that’s a low bar to meet. However, it is also the crux of the ongoing tension. Even if some downstream use might qualify as fair or transformational, how you acquire your data and what you retain can still violate rights. This settlement sidesteps a concrete answer on training as fair use and leaves output liability for another day. So, at the end of the day, the battle has barely been fought.
As a lawyer, I’d tell my business clients that this settlement does NOT amount to a permission slip to train data on copyright material. As an author, I’d add that it’s going to be a long road to determine the scope and methods for ethically and lawfully training AI on copyright works.
Why Compendium and every other book is more than just words, and why that matters
When we talk about “intellectual property,” we default to treatises and the legal elements that determine whether a work has been legally protected (whether copyright, trademark, or patent) based on registration and the legal elements and factors as applied by a judge. This is a necessary model to take the output of the mind and make it concrete enough to protect, but it is ultimately insufficient. What Compendium embodies isn’t just expression. It is the accumulated judgment of a human mind (my mind!) over years. Datasets flatten that history. They treat the finished book as just another row in a table, a collection of words that form ideas on a specific subject, saved into a larger collection of words, without acknowledging the upstream investment that gave the text its coherence.
That’s why authors, myself included, reacted so strongly when allegations surfaced that pirated copies had been swept into training corpora. Consent and compensation are the foundation (maybe even the basement) of the law. Respect for provenance is the culture we build on top of that basic legal infrastructure cobbled together to somehow draw objective concepts from inherently subjective outputs. The settlement’s destruction requirement and its preservation of output claims are small but meaningful steps toward that culture, but they are only the first steps, and it’s going to be a long battle for authors and publishers in establishing and enforcing that provenance.
If you’re an author, what should you do now?
Check the Works List and deadlines. If your titles qualify, file a claim on time or consider whether to opt-out if you have strategic reasons (e.g., you’re a big name author with multiple potential claims already out there).
Inventory your rights posture. Registered vs. unregistered works, publishing contracts, prior assignments. These factors all matter and unfortunately affect your eligibility as well as any leverage you may have.
Track outputs. Keep examples of model behavior that appear to replicate protected expression from your works. The settlement doesn’t waive those theories!
If you’re advising AI or publishing clients
Audit provenance of your training data, not just licenses. “We got it on the open web” is not a policy. Model builders should be able to answer: From where? Under what terms?
Separate ingestion from training. Even courts that are open to fair-use, transformational arguments are signaling that how you assemble and retain data matters. Warehousing pirate libraries is radioactive and an immediate no-fly zone.
Design for authors’ agency. Expect claims about outputs and invest in retrieval logging, provenance signals, and responsive takedown pathways. If you want to be seen as a trustworthy actor who has their AI house in order, this will be increasingly necessary. This battle is only in the beginning, and expect publishers and authors to keep the fight very much alive.
What I want, as both author and lawyer
I don’t want to stop technology. I actually love technology, or I wouldn’t have written a science fiction series about the cultural embrace and use of technology. What I want as an author and as a privacy attorney is consent, compensation, and care:
Consent: Use licensed sources or honor robust opt-out mechanisms that actually work.
Compensation: Collective licensing or revenue-sharing models that don’t force individual creators to litigate one by one.
Care: Provenance-respecting pipelines and a willingness to discard tainted datasets. The cost of maintaining them is borne by the very people who make culture worth modeling, and if companies want a continuous influx of content worth modeling, they will need to protect the human brains behind that work.
The Anthropic settlement doesn’t resolve the philosophy of AI and art. It does sketch the very first line in a potential bounding box. You can’t treat a pirate library as free lumber. And it acknowledges, in hard dollars, that a book is not just a pile of words but the residue of a life’s attention.