Quantifying termination possibilities — an experiment with HathiFiles

US copyright law has “termination of transfer” processes that allow many authors to reclaim rights after a certain period of time has elapsed. While strikingly powerful (termination rights are inalienable!), they are also relatively arcane. While one can imagine termination of transfer being of important public-interest relevance, as a means to renew public access to out of print or otherwise unavailable works, the relevant authors aren’t likely to participate without support. That is, while news stories emerge from time to time about prominent entertainers seeking to reclaim their rights, it’s hard to imagine building the kind of public understanding around their timing and proper exercise that would be necessary to see them really used at scale by unrepresented parties or for works without much commercial future.

Hard, but not impossible.

The law (17 USC §§ 203, 304(c)-(d)) is hard to read and make sense of, but it’s also largely mechanical. With a little effort, parsing it can be automated—making the law more accessible to folks without lawyers. More on that to come.

But then there’s the question of timing. The termination process was not designed to be easy. It’s only available within a five-year window, and can only be exercised if notice is provided significantly in advance of the actual termination. For anyone not responsible for writing a catalog of 30-year-old hit songs, paying enough attention to get the timing right isn’t terribly likely.

But maybe this too is something where tools can help.

The precise timing of a termination right depends on enough inputs (e.g., when was the transfer? what year was the work published? when was the work copyrighted?) that getting accurate estimates from publicly available records and data just isn’t possible. But the timing is just stable enough that, provided we know the publication year, we can at least guess that we’re in the ballpark. So if all we want to do is generate a list of titles that are close enough to the right time to warrant investigation from their authors? Well, that we can do.

Happily, HathiTrust makes a good chunk of the metadata it’s gathered for its corpus publicly available, giving us access to records for some 14 million volumes, with some measure of information on their current rights status. Not a bad start.

Now, you would expect the scale to be relatively large. Would-be terminators have to provide notice between 2 and 10 years in advance of a time falling within a 5-year window, meaning that at any given time there are as many as fourteen calendar years for which termination notices can be made under any of the Copyright Act’s two termination provisions. Which means there as many as 28 calendar years from which a terminable book can hail at any given moment.

Given that there are some fourteen million volumes in the HathiTrust corpus, the number of included titles that might be presently actionable isn’t likely to be tiny. Restricting titles only to those dating between 1923 and the present, this intuition is pretty well confirmed:

I filtered that slice of the dataset through a few other restrictions to whittle down today’s set of candidate titles. Lopping off non-book items and those already available under creative commons licenses, I grabbed the list of titles that the might be eligible under either the 203 or 304 tests: 2,534,535 entries total.

Now, unfortunately the HathiFiles dataset isn’t entirely clean, and there are all sorts of reasons that publication date might fail as a proxy for the relevant determination tests. But, if your main goal is build an awareness campaign around the availability of termination targeted toward academics, well, it’s not a bad start.

And if 2.5 million is too large a number to be workable, well, HathiTrust has another qualification that can prove helpful: they’ve already determined that certain books are out of print. Looking at just those titles leaves us with just 590 presently actionable titles (take a look at them here) that likely have an availability problem.

It’s still early days on this project; much more to come.