What’s needed: A decentralized copyright clearance authority

Draft of 2007.11.11

The legal standing of Project Gutenberg, and more importantly its crucial supplier Distributed Proofreaders, depends on the fact that the books we digitize and redistribute are in the public domain.

Copyright law is complicated. International copyright law is even more complicated—but even in the US, the rules for determining public domain status for works published after 1922 are frickin’ scary.

The practical response from Project Gutenberg has been its centralized copyright clearance facility. Here printed works are described and the editions identified by those seeking permission to upload electronic versions. We upload legible images of the works’ title page and its verso for archiving, and a relatively small number of other data are collected to help pin down which edition of which work is being described. A small crew of PG volunteer staff check these clearance requests when they have the time, and when due diligence is done and the clearance request has been approved, they record and send a unique hashed “Clearance Key” to the requesting party.

This key string is an explicit approval, by PG’s copyright clearance team, of the public domain status of the work.

Now this has been practical, as I said. It was a good solution to the task at hand: covering PG’s butt by lending a shiny veneer of diligence to the absolutely-legal-but-still-prone-to-frivolous-lawsuits world of 21st century copyright paranoia.

But Project Gutenberg is not everything to everybody. Distributed Proofreaders is a big organization, and it doesn’t belong to PG, and should not be the last such institution that comes down the pipeline. We have books to digitize, and while some of us are empowered to do so with centralized vast secretive Efforts, the rest of us have to make do with what we can get.

There are a lot more books being digitized all the time.

So while PG’s current method allows one person with a book in her hand to get permission to scan and republish it… she’d better hurry. A clearance awarded to one person can, under the current system, be re-rewarded to another, and re-re-rewarded to a third. Then it’s catch as catch can when it comes to who does what.

Worse, PG makes a strenuous (and stupid) effort to maintain perfect opacity when it comes to who’s doing what. Some weird-ass notion of “privacy”, I am told, where by “weird-ass” I actually mean “inefficient and centralized”: historically, the ARPANET-wielding founders of PG don’t want the Eye in the Pyramid to see where they hid their gold bullion, or something, and so they don’t like to leave tracks or traces. Or maybe they’re Europeans worried about fascists. Or maybe it’s the general Ron-Paul-susceptible techie world in general.

In any case, you can’t see who did what to whom in any PG context. Can’t see what’s cleared, let alone by whom. So if I am ambitious, and diligent, and request copyright clearances for 1000 books… and don’t get around to scanning all those 1000 books right away all at once, then odds are somebody else will blithely scan some out from under me.

And because I and they and the rest of the world have no idea who I or they are, or what we’ve got and what we’ve “claimed”, there’s no way to discover—when I actually set my butt down in front of the Plustek to scan a book now and then—if it’s redundant. If some other fool has manhandled a fragile $400 three-kilogram monstrosity and managed to get decent OCR material… well, to be honest, I’d rather have that $400 than the backache and the PNG files.

[No, after you.]

Or: There are more conflicts in which one book is cleared multiple times, by different people who never hear about it, and a great deal of work is done by lots of people to make multiple copies of something where (for now) one would do.

And: I have cleared some books that are stupid and boring. No, really—I don’t want to digitize them at all, now. Bad books. Notwithstanding the fact that I think they’re dumb, there’s also no accounting for taste, so surely some poor fool wants them. How can I let them have the clearance I choose to abandon? No way.

One quick solution would be a series of institutional remedies within Project Gutenberg.

But only slightly more complicated, and potentially more useful and more fun, would be to rip that functionality right out.

What a lovely, simple web application that would be: A central site that has user IDs, contact information, a community of users that can see and talk to each other, and a public database of what’s been cleared, what “belongs” to whom, a mechanism for people to abandon or release “claims” they don’t really want to follow up on. And a private but portable key that “signs off” on a work’s copyright clearance, for any application and for any authority-seeking consumer.

That would be nicer.