Shortly after the Google Book Settlement was announced, a colleague wrote to ask what I thought of its procedures for identifying public domain books. I gave a quick response, but promised a more detailed analysis. Here it is, months later.
My quick assessment: the settlement specifies procedures that are likely to identify most public domain works published in the United States. It is less helpful for foreign publications that may have entered the public domain; they are largely absent from the process. Unfortunately, because this is part of litigation rather than legislation, no one else can take advantage of the results of the process - it moves us no closer to having a growing public domain. What is unknown is to what extent Google will want to remove titles from the licensed products and make them freely available to the public.
The Public Domain in the Settlement
At first glance, one might assume that the public domain has nothing to do with the settlement. Works that are either in or that enter the public domain in the US are excluded from the definition of "Book" and hence are not covered by the settlement. Google does not have to pay royalties to the Registry for the use of public domain books and can instead do what it wants with them.
The key issue is deciding what is in the public domain. As anyone who has tried it knows, determining the current copyright status of a book is no easy task and mistakes are easy to make. Google, therefore, cleverly included a "safe harbor" for itself in the settlement with regard to its public domain determinations. First, Google must follow the procedures outlined in Attachment E to determine whether a title is in the public domain. Second, it must provide to the Registry a list of titles that it feels are in the public domain but that have a copyright date after 1922. It must also construct a searchable database listing these titles. (Both of these steps are designed to allow rights owners to object to Google's assessment.) If Google follows these steps, it may use these titles, which I will call "presumed public domain books," in limited ways without liability.
The Public Domain Safe Harbor Process
The procedures that Google must follow in order to take advantage of the safe harbor are found in Attachment E to the settlement agreement. The process is not to determine the actual copyright status of books, but rather to identify those titles that are eligible for the safe harbor provisions. Google and the Registry are willing to presume that these titles are in the public domain - but cannot extend that presumption to anyone else.
Nevertheless, the procedures will likely identify the current copyright status of a work. At least two people must examine each work and determine its copyright date, its place of publication, and whether it has a copyright notice. Google is supposed to use bibliographic and other records to determine if the book was first published abroad or was published abroad within two years of US publication. (This addresses the problem I discuss in this article on the difficulties created by the restoration of foreign copyrights.) Google is also supposed to search Copyright Office records to determine if copyright in a work was renewed or was registered after publication without notice.
The procedures outlined in Attachment E should identify most works that have risen into the public domain in the U.S. The difficulties I envision are few:
- The rules only address presumed U.S. publications since 1922 and non-U.S. publication from before 1923. Any foreign works that may have entered the public domain in the U.S. (by having entered the public domain in their home country prior to 1996) are excluded. The foreign works that were recently restored to the public domain by a recent decision in the 10th Circuit would also not be eligible for the safe harbor provisions.
- We know as well that the determination of prior foreign publication is very difficult. The bibliographic utilities available were not designed with precision in mind, and it is likely that there will be some foreign works that are inaccurately identified as "presumed U.S. publications" and hence presumed to be in the public domain. (Of course, there are provisions to allow rights holders to overturn public domain determinations.)
- With American works, a work that may appear to be in the public domain (because, for example, it was published without a copyright notice) could contain material that was separately copyrighted. Reproducing the entire work would infringe on the copyright in the pre-existing material. The safe harbor would protect against this small danger.
- It is likely that some of the books that Google will identify as being in the public domain using this process are actually considered to be unpublished - primarily because they were never distributed to the general public. As unpublished works, they would still be protected by copyright, and be outside of the safe harbor provisions.
One oddity: the settlement agreement assumes that books published prior to 1923 are in the public domain; Google is only required to provide the Registry with lists of public domain books published after 1922. Attachment E, however, envisions Google submitting lists of pre-1923 titles to the Registry as well.
Issues with the public domain provisions
While the procedures for determining presumed public domain status may be acceptable, there are still a number of issues surrounding the place of public domain works in the settlement:
- Google is supposed to search either the Copyright Office's Registration and Renewal records "or a copy thereof." There is no indication of what would constitute an acceptable copy. Would it include the machine-readable records since 1978 available from the Copyright Office? What about the published Catalog of Copyright Entries (CCE)? Under the 1909 Copyright Act, the CCE was admissible in court as evidence of registration, but that provision was dropped in the 1976 Act. Does the CCE remain a "copy" of the Copyright Office records? And what about the unofficial online versions of the renewal records such as the database at Stanford? We don't know how well these correspond to the printed volumes - and we know that the printed volumes of the CCE have less-than-perfect correspondence with the records in the Copyright Office. Would searching of these records be acceptable?
- The rule for pre-1923 works that are presumed not to be US publications seems to make no sense. I assume it wants to address the ridiculous Twin Books decision that says that if a book was published abroad without notice, it was in effect unpublished for US copyright purposes. If the book was later published in the US with notice or registered with the Copyright Office, the copyright clock would start clicking at that point. The rule as written, however, suggests that Google must examine the subsequent copyright history of all foreign works published before 1923 -- even those published in the 18th century - if it wishes to take advantage of the safe harbor provisions.
- The procedures to establish the safe harbor are going to be time consuming and expensive. Two people are going to have to examine each book and record data. More will have to search the Copyright Office records.
- The extent of protection that the safe harbor provides to Google is unclear. The agreement says that if Google follows the process for a title, "then Google may treat such book as if it is in the public domain under the Copyright Act." If the agreement stopped there, things would be fine. Works that have risen into the public domain can be used any way you want, without asking permission or incurring liability.
What does clause (a) mean? It seems to be saying that you can use the presumed public domain book as if it were a Display Book. With Display Books, one or more Display Uses are authorized. The Display Uses are Snippet Display, Front Matter Display, Access Uses, and Preview Uses. This clause would seem to be suggesting that with the presumed public domain book, you can use any of these without liability. The key one is Access Uses: it allows Google to display the full text of works. So clause (a) appears to be saying that Google would have no liability for displaying the full text of presumed public domain works, and clause (b) says it has no liability for downloading of presumed public domain works.
But what about the other things that Google might do with these books if they were really in the public domain? For example, can Google allow printing of these presumed public domain books (something that is limited with Display Books)? Considering the money that Google is paying to the plaintiff's lawyers ($30 million!), this (like so many other things in this section) could be clearer.
- The safe harbor provisions apply only to Google. Anyone else who wants to use one of these books would face the draconian penalties of statutory copyright infringement if it turned out the book was actually still copyrighted. Even with all this effort, one will not be able to say with certainty that a book is in the public domain. To do that would require a legislative change - and not a negotiated settlement.
Will Google exploit the safe harbor?
As I look at these procedures, I am not at all sure that Google will ever take advantage of them. As we have seen, the costs of doing so will be high and the benefits limited. What are the benefits? I believe there are two. First, Google can avoid liability for any infringing uses in the rare cases where a work is not really in the public domain. But of course if Google just left the books as part of the settlement and did not try to assert that the titles are presumed to be in the public domain, it would also avoid liability - that is the whole point of the settlement!
The other benefit to Google is that it would not have to pay the Registry the 70% of the net revenues due to it for the use of these works. Some publishers may also claim ownership of works that are in reality in the public domain; Google could save the $60 cash payment due to these publishers if it contested the publishers' assertions.
On the other hand, keeping these works in the settlement might be to Google's advantage. Doing so would add to its monopoly of digitized content, which is limited to orphan works in the settlement as written. Third parties could if they want go through the expensive process to identify public domain works in the Google corpus and then start using them freely. It would be much cheaper, however, to let Google do the work of identifying titles that are presumably in the public domain and then take the calculated risk of exploiting those works (without the benefit of the safe harbor) on the reasonable assumption that Google's procedures are rigorous enough to identify works that are truly in the public domain. By failing to identify these works as public domain works, they become quasi-orphans that Google can monopolize.
Of course Google itself could decide not to take advantage of the safe harbor. It could on its own assert that certain works are not subject to a "copyright interest," and hence are outside the scope of the settlement agreement. (It would have to do this with foreign works, since almost all are not covered by the safe harbor.) If Google trusts in its investigative processes, it may not need the safe harbor.
But in the end, I suspect that the inclusion of the safe harbor provision in the agreement is a sign that Google is interested in expanding the number of titles that it can freely offer to the public. If we assume that Google was planning on engaging in copyright investigations all along, then securing a safe harbor for the fruits of one's efforts seems like a smart idea. It may also indicate that Google is as repulsed as the rest of us at the greed of the plaintiffs who want to seize for themselves the royalties that in reality are due to others or that need not be paid in the first place.
Bottom line: until we know how Google is going to implement the settlement, we cannot tell whether it will be good or bad for increased public access to books likely to be in the public domain.
Identifying Government Works
The Google Book Settlement Agreement is one more example that the federal government should clearly and unambigously mark and give notice about the copyright status of the information it produces, sponsors and disseminates. Government has neglected its responsibility in this matter. The consequence is that the time and cost burden of doing an after-the-fact copyright analysis is passed to all users -- both government and citizens. Rather than knowing for certain and uneqivocably what is a "work of the U.S. Government", the best a user can do is make a "presumed" determination or guess. The margin for error is high even for those who know copyright law and government acquisition regulations.
Unlike public domain works, works of the U.S. Government are excluded from copyright protection at the time of creation and are not dependent on dates or term algorithms.
Posted by: BKlein | May 10, 2009 at 08:51 AM