There has been a lot of talk by some of the critics of the Google Books Settlement (GBS) about how it gives Google a monopoly on orphan works. But most of the commentators who have talked about orphan works and the Google settlement are sloppy in their language. In this post, I want to clarify the language and then make a stab at some numbers.
There are two sets of books governed by the settlement. First, there are the in-copyright but out of print books whose rights holders sign up with Google. We can call these "active rights holders." Second, there are the in-copyright but out-of-print books whose authors do not register with Google or the Books Rights Registry: the "inactive rights holders."
Some like to call this second group "orphan works," but that is wrong. This latter group actually consists of two separate groups. First, there are rights holders who could be easily located but who have chosen not to sign up with the Registry. Foreign authors whose works are normally protected by their national reproduction rights organization come readily to mind. I suspect that many could not conceive that their works could be used without their explicit permission and so see no need to register. Others may not learn of the settlement in spite of Google's advertising campaign. Because these authors could be easily identified and located, however, their works are not orphan works. The other portion of the "inactive rights holders" subset are the true orphan works: works whose copyright owners either cannot be located either because they cannot be identified or because their whereabouts are unknown.
The scope of the orphan works problem
Trying to come up with numbers is a very challenging task, but here is a quick attempt to get some ballpark figures. First, we need to look at the potential scope of Google's database. The Lavoie article on the Google 5 said that WorldCat contained 32 million print book records in 2005. I think that number is too high because we know that there is a tremendous amount of duplication in WorldCat, but let's use it as the outside potential limits of the Google database. Bowker's Global Books in Print reports 18.5 million book items in print, which leaves 13.5 million titles that are out-of-print. (Since that is a current figure and would include books published between 2005 and 2009, it is probably too high - but I also suspect that there are many in-print foreign titles that are not included. Let's use it.)
Lavoie reported that there were 5.4 million titles that were out of copyright (pre-1923), so we are left with roughly 8 million titles that are potentially in copyright but out of print. (Some of these would be American works that have not had their copyright renewed and hence are in the public domain, but I think the number could only be 150,000 1.7 million at most, and so I am going to ignore that).
[UPDATE: So I got a good question about the number of works that might have entered the public domain that pointed out that my original number is wrong. Here is my thinking: of the 8 million books, half are in English (following WorldCat numbers) and hence are likely to be American works. (I won't worry about books published only in England.) Of those 4 million, 63% according to Lavoie are after 1963, and still protected by copyright. That leaves 1.9 million works published between 1923 and 1964. A 1961 copyright study suggested that maybe 9% of these works were renewed and still protected by copyright, though recent work by Michigan indicates that 41% of the works are still protected by copyright. If we assume 90% are public domain, then 1.7 million works are public domain. If 59% are PD, then 1.1 million are PD. Let's call it 1.5 million - and the number of in-copyright but out of print works should drop from 8 million to 6.5 million.]
So we are talking about 8 [6.5] million works published since 1923 that are in copyright but out of print. Of those, how many are going to have inactive rights holders? What percentage of authors are going to register with Google, and what percentage will ignore the call? Or if we look another way - what percentage of these works are true orphans?
Denise Troll Covey's numbers might provide some guidance. In CMU's random trial, she was unable to locate 21% of publishers. (There random sample was not limited to out-of-print books, so the percentage might actually be too low.) If we assume that number would hold on the 8 [6.5] million, that would mean that we have about 1.7 [1.4] million true orphan works in the total database of 13.5 [12] million. (That number might actually be smaller since some rights holders other than publishers might come forward via the settlement.) The remaining 11.8 [10.6 ] million books would either have rights holders who registered with Google or who choose not to register.
Even with orphan works legislation, these works would not be eligible for inclusion in a digitized books database since they are not true orphans. The Google Books settlement is the only way to get cost-effective access to them.
What we need in the settlement is a compulsory license that would allow anyone to license the use of a work maintained by a non-active rights holders, and not just orphan works.
Let’s see, Google wanted to make books easier to find. So, a bunch of people in the book business sued for damages. (That’s the publishing business for you!) Now they come up with a settlement where books will sell over Google much as any bookstore — even books that are still in copyright but not available in print. Heavens, says Law Professor P. Samuelson, the law cannot permit this without an Act of Congress which by their last several Acts have created a vast number of books which are illegal to reprint, even though the books are no longer available for sale, so there’s no actual interest to protect by copyright, except the interest in limiting the amount of information available to readers. We need to protect the right of George Orwell’s grandchildren to keep his works off of the Kindle, at least unless they can cash in. So what if the Constitution defines a purpose for copyright, and it isn’t money-grubbing by the author’s descendants, who would deprive the world of their ancestor’s works if they can’t cash in on it? But let’s go back to Congress and see what they come up with this time, that might encourage some more donations from Hollywood in return for extending the length of copyright another hundred years or so? The implicit premise of Prof. Samuelson’s view is that our current copyright law is constitutional and that the courts cannot entertain any settlement that would — heavens! — make books more widely and easily available to the public.
Posted by: Albion Tourgee | July 28, 2010 at 02:35 AM
Jerome, thanks for the comment. You are right that the Settlement Agreement stipulates that the Registry "will attempt to locate Rightsholders": Section 6.1(c). I wonder, however, how much effort it will actually invest in this. Any unclaimed revenues from the three types on works with non-active authors - non-orphan works whose authors elect not to participate; true orphan works; and public domain works accidentally included in Google's products - will go to the Registry to fund its operations. Furthermore, the Registry is going to governed by the Plaintiff's representatives; they would have little reason to look for potential rights holders outside of their circles.
My guess is that the biggest class of unclaimed books will be works by foreign authors that either do not know about the settlement or have elected not to be part of it. I hold out little hope that the Registry will rush to pay fees to them, when the revenue their books generates could be used to fund the operation of the Registry itself.
Posted by: Peter | June 28, 2009 at 03:36 PM
I am a SF attorney. Under the Google Book Settlement, a Registry trust is created to collect and disburse revenue from Google and other e-publishers for distribution to rights holders. If money adds up for any work,and is initially unclaimed, then the Registry will have the funds to search for and find the rightful parent for any orpahn work. Orphan works that dont produce income may stay abandoned and forsaked, but may still pop up in Google Search results.
Posted by: Jerome Garchik | June 19, 2009 at 02:18 PM