CC0 and Data Citation

Originally posted on the UC Davis Library Scholarly Communications Blog, August 16, 2017.

I always recommend CC0 public domain waivers as the best practice for public data sharing. There are all sorts of good reasons why public domain waivers work better and more predictably than other alternatives, including the traditional Creative Commons license suite – but you can (and should!) read about those elsewhere. Instead, I want to address one common worry people have about applying a CC0 public domain waiver to their datasets. It often looks something like this:

I want to share my data, but it is important to me that I’m cited when people use it. I prefer CC BY to CC0 because this kind of attribution is what I care about most.

Good news! In most cases, your right to see your contributions cited is exactly the same under both CC0 and CC BY. Why? Because citation practices are built around ethical norms, not around legal requirements. The CC BY license does, of course, require “attribution.” But all this means, essentially, is that users agree to provide identifying information whenever the work is shared with the public. When it comes time for someone to recognize the role your data played in their research—when it’s time to provide that citation—CC BY is silent. Regardless of whether you use CC BY or CC0, you will have to turn somewhere else to see that your work is properly cited: academic integrity.

Plagiarism, Copyright, and Citations

Fortunately, well established norms around plagiarism are there to fill this gap, and they are already part of your day-to-day experience as a scholar. When using or referencing someone else’s ideas and contributions, you cite your sources. You don’t do this because of copyright or because of a license term; you do it because honesty, transparency, and academic ethics require it. Plagiarism steps in to resolve issues around credit where copyright would otherwise be silent.

So if your source is in the public domain, of course you still cite your source.  And if your data source is in the public domain because of a CC0 public domain waiver? Easy: you cite your source and don’t think twice.

And if you think that using a CC0 waiver might be interpreted by users as an invitation to plagiarize, think again. Among users of CC0 public domain waivers, whether for data or otherwise, it is completely normal to ask for recognition, and it is completely normal for users of CC0 works to happily provide it. It shouldn’t take any special leaps of faith to imagine scholars to get it right on citation. After all, you’re already doing it all the time.

Redundancy is resistance: share your scholarship

Originally posted on the University of California Office of Scholarly Communication blog, August 2017.

Who has the right to make your scholarship available? Who is able to read it? And who can disappear it?

If you haven’t given these questions much thought to date, it is worth having a fresh look as national conversations about the power of information—and the awful power of misinformation—continue to grow in prominence. It is a bleak testament to the importance of the academic enterprise that the ways in which scholarship is made and accessed are disputed territory in the campaign against facts.

Information—and access to information—are surprisingly fragile. Yes, an increasing amount of scholarship and research data is now made publicly available, even when not published “open access,” particularly when federally funded or produced by the government. But many times, these public access resources are held only by the government or are only made freely available to the public through government programs. Even assuming the best, as formerly publicly accessible governmental resources go dark, the precariousness of single points of access to information and research of public importance has never been in sharper relief.

UC campuses and universities around the United States, distressed at the potential disappearance of government data, have been working to see it backed up. The Internet Archive, similarly, is creating a complete backup of its collections (including the invaluable materials in its Wayback Machine) in Canada to ensure they stay safe. But what can individual researchers at UC do to make certain their contributions remain available, regardless of what happens in DC?

The best first step is to see that your work isn’t merely available somewhere, but that it is redundantly available.

Today’s environment is an important reminder that any given institution might ultimately prove to be a flawed steward of the information in its control. Unfortunately, this risk can be compounded by our traditional mechanism for conveying knowledge, namely, to concentrate the authority to distribute scholarship, via the author’s copyright, in the hands of a very limited number of parties. Most typically, this would mean giving a single publisher broad control over where, when, and how a work of scholarship might appear. The risk this arrangement poses for authors and for the public is that a copyright owner might use its control to silence a work, rather than to further its reach. This might sound paranoid, but there is no shortage of examples of copyright used in precisely this way, whether it is the Ecuadorian government seeking to silence criticism of its former president or the owner of a basketball team looking to get rid of an embarrassing photograph. Less overtly censorial (but just as problematic) is the long track record among copyright holders of letting works fall out of print and out of view—or literally letting them rot—rather than keep them in the public eye for the duration of our ever-longer copyright terms.

Happily, it’s increasingly common for the rights over scholarship to be more diffusely distributed, and savvy authors have multiple avenues for ensuring their work doesn’t go dark. Each of the following might have the power to share a particular work of scholarship with the public:

  • Publishers, of course, serve this function.
  • Authors often retain rights to share their work, most particularly as a “preprint” version. You can learn more about publisher policies on author sharing on SHERPA/RoMEO.
  • Universities, under open access policies, can have the right to share institutionally affiliated scholarship (see below).
  • Government agencies now often require authors to share the research they fund.
  • The public itself might be empowered to share scholarly work, such as when it is published on open access terms under a Creative Commonslicense.

Posting scholarship redundantly helps overcome the shortcomings of a given venue. For UC researchers, our open access policies can be a particularly important bulwark against threats to your work’s accessibility. The mechanics are simple: the policies empower you to make your scholarship publicly available through an additional channel, UC’s eScholarship. In most cases, you have this right regardless of the specific language used in your publishing agreements. It’s a powerful way around restrictions that might otherwise keep your work solely in someone else’s control.

Own your OA Policy

Harvard-style open access policies are great. I’m glad my institution has one, and I think the mechanics are both clever and generally beneficial. But I’m often frustrated about some of the fiddly implementation details that leave faculty confused or, worse, exposed. These are university policies, thoughtfully crafted, that universities can and should stand behind. This means more than just crafting the license and alerting publishers; in my read, it also requires sensible notice and takedown policy and elimination of all doubt when it comes to inconsistencies with faculty contractual obligations.

Owning your OA policy step 1: take the breach of contract issue head on

Here’s the problem:  Harvard-style OA policies take the form of a license that faculty (and maybe others) grant to the university.

But many of the publishing agreements faculty sign purport to transfer or license some or all of their rights to their work to the publisher and warrant that the rights are being granted unencumbered.

The smart folks who crafted these things took a good look at the copyright act (Section  205(d)-(e)) and reached the conclusion that this state of affairs poses no real problem so long as publishers are made aware of the OA policies. Concentration in the scholarly publishing industry helps quite a bit in that regard.

At least, there’s no problem if your sole interest is ensuring the validity of the university’s right to make covered works available to the public. But what if you want to ensure faculty aren’t at risk for complying? Or avoiding compliance for fear of legal risk?

There are awfully good arguments to be made that there simply isn’t credible legal risk for authors signing these problematic publishing agreements. Even if faculty authors are in breach of their warranties, what would the damages be? Authors are (typically) giving away their work for free. And (much to consternation of OA advocates) the availability of scholarship in IRs does not appear to date to have reduced publisher revenues.  Besides, the publishers’ knowledge of OA policies and  their relative sophistication have bad faith written all over them.

From experience, I can say that faculty non-compliance for fear of legal risk is a real phenomenon. Understanding the priority of conflicting transfers under Section 205 is not exactly common knowledge, and the in and outs of contract enforceability and damages are equally esoteric. What do faculty see? They see a promise to their publisher that would seem to prevent them from contributing to the repository.

At UC, our materials say not to worry about inconsistent agreements, but we don’t make the argument strongly enough. And, frankly, knowing that this practice might well give rise to contract liability (albeit, probably only nominal liability) is an uncomfortable position from which to be telling folks not to worry about it.

If the university believes that there’s nothing to worry about here (which I think is absolutely reasonable) it should take the additional step and put its money where its mouth is: promise to defend its authors against any actions alleging policy compliance to be a breach of the authors’ warranties. Own your policy! If the risk is truly nominal, which again is a reasonable position, then shifting the risk to the University should not be a big deal.

Owning your OA policy step 2: give up on the DMCA

Anybody reading this has probably heard some version of  this rant already. I’m sorry for beating this dead horse, but I’m only more convinced with time that this is important.  I’m thoroughly unconvinced by the argument that the placement of articles in institutional repositories in compliance with university open access policies is somehow a DMCA safe harbor eligible activity. Employees of an institution complying with an institutional policy? Those aren’t 512(c) “users,” those are “employees acting within the scope of their employment.” And you know what? That’s not a bad thing.

When operating as if were running a safe-harbor eligible operation, the institution effectively declines to exercise its knowledge and expertise of its policies and of the law,  and lets the decision of how to respond to a properly filed takedown request lie instead with faculty. Given that (a) faculty knowledge of the legal mechanics of OA policies is severely limited, (b) institutional interest in keeping covered in material in its repository is (in theory) high, (c) few takedown notices are issued to IRs to begin with, and (d) institutions should not be throwing their faculty under the bus for compliance with institutional policy, I’m not sure that this is an optimal means for handling takedown notices.

Owning your policy means standing up for it when challenged, whether the challenge takes the for of a takedown or a breach of contract action. It makes communicating with authors less complicated and it leaves the institution properly responsible for institutional policy. I think it’s a no brainer.

A semi-cooperative publishing model

Surprise: I have a few thoughts about publishing.

Let’s take folks at their word and assume that many are concerned about author remuneration in the book business. Let’s further assume that the authors they’re worried about are the ones who are actually struggling to make ends meet or are otherwise outside of (or on the periphery of) authorship as a profession for financial reasons. I’ve written elsewhere of my skepticism about the “grow-the-pie” approach traditionally associated with expanding copyright protections. Even if it worked (it’s a big if), I’d expect this to primarily reward existing market winners, making it a trickle-down approach to increasing author pay.

Finally, let’s assume we can’t just burn the whole thing down and start again. What are we to do? Well, here’s one quick publishing model that might help more equitably allocate reward:

Break down the author’s royalty into two segments. The first, and larger, segment is a traditional royalty based on total copies sold. This rewards the instinct that the market is somehow meritocratic and lets big sellers still be bigger winners.

The second segment (8%?) is diverted to a pool that is split between the publisher’s entire author list, probably on some pro rata basis depending on the number of total books currently with the publisher and the type(s) of publication at issue.

Couple this with a time-limited publishing agreement (twenty years?) to both (a) further allow standouts to capitalize on their success by leaving painlessly to monetize elsewhere, and (b) avoid dilution of the shared royalty pool by the accrual of titles over time. Books meeting certain sales standards are, of course, allowed to renew their contracts, although I imagine that scenario would probably be relatively uncommon.

Call it semi-cooperative publishing.

You could, of course, go with a purely cooperative model, but that’s a bit of a different discussion, and comes with its own sets of complications.

So who’s in? Should we do this thing?


Getting at the root of predatory publishing

Aside from being genuinely pernicious and problematic on its own terms, the predatory publishing problem creates extra frustrations to open access advocates by providing an easy mechanism for directing suspicion and ire at open access publishing generally. And while it’s frustrating when this sleight of hand is employed by known open access critics, it can also come from within the movement, particularly in the context of ongoing debates about the APC business model with which predatory publishers are so closely associated.

Most recently, Kevin Smith furthered this conflation when outlining his skepticism of APC-funded open access. In that piece, Smith  identifies the predatory publishing problem as at the heart of what’s wrong with APCs, writing that “‘predatory’ journals . . . can only exist because of the APC business model” and further that  APCs are “the root problem” behind predatory practices.

I don’t think this is right and, further, by reinforcing the association between open access publishing and predatory publishing, I think the line of reasoning poisons the well for the whole movement.  Moreover, misdirecting blame for predatory publishing only serves to distract us from formulating productive responses to the problem.

So what is behind predatory publishing? The low costs of publication meeting author incentives.

Too often the APC model is isolated as a root cause of predatory publishing, rather than as predators’ present best mode of operation.  But if APCs aren’t the causal driver, what is? I think a better explanation for the phenomenon is as the marriage of declining publishing costs and warped author incentives.

In a fully digital, online publishing environment, the costs involved in publishing decrease dramatically. This observation isn’t a surprise to anyone committed to open access; the idea that the costs involved in distributing scholarly articles are so close to zero that the price to read them can and should be made zero is a founding principle of the movement.

In general, low publishing costs are something we celebrate. Not only do they make open access possible, but they also facilitate entry into the publishing market.  Low costs of doing business are precisely how we are able to field new, public-minded competitors to the legacy players and explore new publishing models.

But the same conditions that provide all this promise also appeal to folks with fewer scruples.  Scholarly authors are often under intense pressure to publish, with some valuing publication sufficiently to be both willing to pay to do it and willing to overlook (whether inadvertently or intentionally) the failings of a given outlet. Predators have the means (low cost publication), motive (pecuniary gain), and opportunity (a large pool of scholars eager to publish) to do what they do. Importantly, they would have all these things whether or not APCs were employed by respectable publishers.

The point is that predatory publishing is best seen as a close cousin of “the fake news” problem. It isn’t a narrow phenomenon rooted in the niceties of how scholarly publishing is funded; it’s a manifestation of a much broader set of issues concerning the provenance and validation of information in a world where we’re all a name, a logo, and a website away from declaring ourselves publishers.

Predatory publishing is not just a scholarly publishing problem

In his article, Smith acknowledges that “[t]here are, of course, predatory practices throughout the publishing industry, and they take a lot of different forms.” This point is important and needs highlighting. If predatory publishing practices happen outside of areas where author-pays models are commonly accepted, it should suggest strongly that the acceptance of author-pays models are not the driver of the phenomenon.

And indeed, scholarly authors are not alone in being plagued by suspect or scammy publishing practices. While the pressures of “publish or perish” might be largely unique to the academy, as is the normalization in some sectors of pay-to-publish models (page fees in certain sciences, APCs in gold OA), many of the same patterns and tactics used by “predatory publishers” in the scholarly context are also used to earn pay-to-publish fees out of would-be trade authors. And this in an area where “[y]ou should never pay to be published” is common wisdom.

In my time serving as the executive director of Authors Alliance, I saw this problem firsthand while trying to assist members who had been, essentially, conned by less-than-legitimate publishing operations. The phrase “predatory publishing” hasn’t been adopted in these communities, but the concept and underlying causes are the same.

Resisting easy solutions

We would all like to distill the definition of “predatory publisher” down to one, easily identified, perfectly predictive, and unmistakable attribute. Imagine how functional journal blacklists would be if we could safely declare the charging of author fees always and everywhere illegitimate?

The first problem, of course, is that we know this isn’t true in practice. It wasn’t true when traditional subscription publications adopted page fees, it wasn’t true when PLOS and BMC adopted APC-funded OA, and it’s not true of the high-quality gold OA outlets operating today. An attack on APC-funded publications generally is necessarily over-inclusive. Are we happy to have an error cost in our methodology that we know takes takes down the good along with the bad?

Some commentators are explicitly comfortable with that cost. Take, for instance, Raghavendra Gadagkar, who in a note at the Royal Society Journal of the History of Science wrote that— 

[T]he ‘pay-to-publish’ model should be dismantled altogether. We should gradually create social and moral stigma, and eventually legal strictures, against paid publications; having paid for publishing scholarly papers should automatically devalue their prestige and eventually disqualify them from consideration.

Even if Gadagkar’s proposed stigmatization (and criminalization?) of APCs were successfully implemented, the approach would have the unfortunate effect of targeting good actors, while doing little to hurt the bad actors motivating the policy. Consider: do true predatory outlets have concern for the prestige of their publications? No. Do the authors who publish in them think they are buying such prestige? Well, sometimes—when they are confused about who it is they are actually publishing with—but, generally, no.  Are predatory actors concerned about the law? Well, the worst predatory publishers are already on the wrong side of the law in many jurisdictions, and it hasn’t seemed to do much to ameliorate the problem.

Instead, all of these levers primarily bear on the folks who already have a commitment to operating within the system. It’s like going after fare avoiders by locking the turnstile. The jumpers still get over fine, but the folks who would happily pay the fare are locked out.

Understanding the problem in this light doesn’t  mean we have to be fatalistic and it doesn’t make us technological determinists.  But it does suggest that the root causes of the problem are deep and complex enough to require active, ongoing, and dynamic countermeasures. Accept no less and nothing simpler.

The takeaway

While it’s easy to invoke the specter of predatory publishing to discredit a model of open access one doesn’t like, everyone in the open access movement should walk away from this line of argumentation. Why? Because we should all know by now that predatory publishing  is not going away anytime soon,  and continued confusion about its connection to OA hurts everyone. It should be our shared goal to work to counteract predatory practices and to distinguish these from the work done by trustworthy open access outlets. But there’s simply no good to be done by continuing the conflation of any kind of “open” and “predatory.”

As for APCs, let’s continue having the important and serious discussions about their place in open access scholarship and their effects on the dynamics, incentives, and accessibility of scholarly publishing. But let’s move beyond the  under-developed charge that APCs are behind predatory publishing.


Quick disclaimer re the “.attorney” TLD

I want to issue a quick disclaimer pointing out that, yes, I know this TLD is kind of dumb—I mean, .attorney, really?

The basic story here is that being named “Michael Wolfe” is not a particularly great thing when it comes to domain name acquisition. There are just an awful lot of us out there.

A clever person would have bought an arbitrary or fanciful domain instead, but I hit enough dead-ends with that approach, that ultimately I figured it would be easier to just buy my name with a goofball TLD.

So there you have it. I am an attorney, technically, but this site has nothing to do with the advertising or provision of legal services.

A new site

For the last few years, I’ve been blogging at Berkeley Law under the title Public Interest Authorship and in a few other venues. With my time at Berkeley at an end, I’m moving my personal blogging here, along with my backlog.

Please excuse all of the broken links / dropped images / etc. in the backlog. They probably won’t be fixed, but what are you going to do?

More to come.

Quantifying termination possibilities — an experiment with HathiFiles

US copyright law has “termination of transfer” processes that allow many authors to reclaim rights after a certain period of time has elapsed. While strikingly powerful (termination rights are inalienable!), they are also relatively arcane. While one can imagine termination of transfer being of important public-interest relevance, as a means to renew public access to out of print or otherwise unavailable works, the relevant authors aren’t likely to participate without support. That is, while news stories emerge from time to time about prominent entertainers seeking to reclaim their rights, it’s hard to imagine building the kind of public understanding around their timing and proper exercise that would be necessary to see them really used at scale by unrepresented parties or for works without much commercial future.

Hard, but not impossible.

The law (17 USC §§ 203, 304(c)-(d)) is hard to read and make sense of, but it’s also largely mechanical. With a little effort, parsing it can be automated—making the law more accessible to folks without lawyers. More on that to come.

But then there’s the question of timing. The termination process was not designed to be easy. It’s only available within a five-year window, and can only be exercised if notice is provided significantly in advance of the actual termination. For anyone not responsible for writing a catalog of 30-year-old hit songs, paying enough attention to get the timing right isn’t terribly likely.

But maybe this too is something where tools can help.

The precise timing of a termination right depends on enough inputs (e.g., when was the transfer? what year was the work published? when was the work copyrighted?) that getting accurate estimates from publicly available records and data just isn’t possible. But the timing is just stable enough that, provided we know the publication year, we can at least guess that we’re in the ballpark. So if all we want to do is generate a list of titles that are close enough to the right time to warrant investigation from their authors? Well, that we can do.

Happily, HathiTrust makes a good chunk of the metadata it’s gathered for its corpus publicly available, giving us access to records for some 14 million volumes, with some measure of information on their current rights status. Not a bad start.

Now, you would expect the scale to be relatively large. Would-be terminators have to provide notice between 2 and 10 years in advance of a time falling within a 5-year window, meaning that at any given time there are as many as fourteen calendar years for which termination notices can be made under any of the Copyright Act’s two termination provisions. Which means there as many as 28 calendar years from which a terminable book can hail at any given moment.

Given that there are some fourteen million volumes in the HathiTrust corpus, the number of included titles that might be presently actionable isn’t likely to be tiny. Restricting titles only to those dating between 1923 and the present, this intuition is pretty well confirmed:

I filtered that slice of the dataset through a few other restrictions to whittle down today’s set of candidate titles. Lopping off non-book items and those already available under creative commons licenses, I grabbed the list of titles that the might be eligible under either the 203 or 304 tests: 2,534,535 entries total.

Now, unfortunately the HathiFiles dataset isn’t entirely clean, and there are all sorts of reasons that publication date might fail as a proxy for the relevant determination tests. But, if your main goal is build an awareness campaign around the availability of termination targeted toward academics, well, it’s not a bad start.

And if 2.5 million is too large a number to be workable, well, HathiTrust has another qualification that can prove helpful: they’ve already determined that certain books are out of print. Looking at just those titles leaves us with just 590 presently actionable titles (take a look at them here) that likely have an availability problem.

It’s still early days on this project; much more to come.

Recent Tabs

I’ve been dormant here for a long time, but meanwhile there’s shortage of important news and events—so much so that closing out my ever-expanding set of browser tabs is looking ever less likely. Here are a few items of note curated from that collection:

Chris Kelty, “Open Access, Piracy, and Scholarly Publication”. I was disappointed to be unable to attend this recent talk at Davis, but the good news is that it was recorded. Chris was the driving force behind the UC open access policy, author of a must-read OA monograph on free software, and a scholar of “open” communities. As such, it’s no surprise that he has an interesting thing or two to say about scholarly publishing. The talk really holds no punches, so if you want to see both Elsevier and institutional repositories taken to task (and harbor hope for something better than either), it’s worth watching.

99% Invisible, “The Giftschrank.” 99% Invisible dives into the very opposite of open access in exploring the German history of locking dangerous texts in “poison cabinets” or giftschranks. Not only does it raise all sorts of questions about the regulation of information (What do we do with dangerous information? Who decides that it is in fact dangerous? What can we learn from what past societies found dangerous?), but it has an interesting copyright nexus with Mein Kampf and all the recent news about that notorious book falling into the public domain in Germany (the Bavarian government, which held the copyright, had used that control to refuse publication of the text—copyright as giftschrank).

Public comments are out from the Copyright Office’s § 1201 study. And public roundtables are being scheduled for May in both DC and San Francisco.

Michael Eisen, “On Pastrami and the Business of PLOS.” OA business models provoke no shortage of ethical and pragmatic concerns. Michael Eisen (the PLOS cofounder/OA advocate/Berkeley biologist) takes a hard and candid look at some of the questions that have been raised about PLOS, and it’s a worthwhile read.