Blog powered by TypePad
Member since 04/2004
AddThis Social Bookmark Button

Court cases

May 27, 2009

Why the Google Books Settlement is better than orphan works legislation

There has been a lot of talk by some of the critics of the Google Books Settlement (GBS) about how it gives Google a monopoly on orphan works.  But most of the commentators who have talked about orphan works and the Google settlement are sloppy in their language.  In this post, I want to clarify the language and then make a stab at some numbers.

There are two sets of books governed by the settlement.  First, there are the in-copyright but out of print books whose rights holders sign up with Google.  We can call these "active rights holders."  Second, there are the in-copyright but out-of-print books whose authors do not register with Google or the Books Rights Registry: the "inactive rights holders." 

Some like to call this second group "orphan works," but that is wrong.  This latter group actually consists of two separate groups.  First, there are rights holders who could be easily located but who have chosen not to sign up with the Registry.  Foreign authors whose works are normally protected by their national reproduction rights organization come readily to mind.  I suspect that many could not conceive that their works could be used without their explicit permission and so see no need to register.  Others may not learn of the settlement in spite of Google's advertising campaign.  Because these authors could be easily identified and located, however, their works are not orphan works.  The other portion of the "inactive rights holders" subset are the true orphan works: works whose copyright owners either cannot be located either because they cannot be identified or because their whereabouts are unknown.

The scope of the orphan works problem

Trying to come up with numbers is a very challenging task, but here is a quick attempt to get some ballpark figures.  First, we need to look at the potential scope of Google's database.  The Lavoie article on the Google 5 said that WorldCat contained 32 million print book records in 2005.  I think that number is too high because we know that there is a tremendous amount of duplication in WorldCat, but let's use it as the outside potential limits of the Google database.   Bowker's Global Books in Print reports 18.5 million book items in print, which leaves 13.5 million titles that are out-of-print.   (Since that is a current figure and would include books published between 2005 and 2009, it is probably too high - but I also suspect that there are many in-print foreign titles that are not included.  Let's use it.) 

Lavoie reported that there were 5.4 million titles that were out of copyright (pre-1923), so we are left with roughly 8 million titles that are potentially in copyright but out of print.  (Some of these would be American works that have not had their copyright renewed and hence are in the public domain, but I think the number could only be 150,000 1.7 million at most, and so I am going to ignore that).

[UPDATE: So I got a good question about the number of works that might have entered the public domain that pointed out that my original number is wrong.  Here is my thinking: of the 8 million books, half are in English (following WorldCat numbers) and hence are likely to be American works.  (I won't worry about books published only in England.)  Of those 4 million, 63% according to Lavoie are after 1963, and still protected by copyright.  That leaves 1.9 million works published between 1923 and 1964.  A 1961 copyright study suggested that maybe 9% of these works were renewed and still protected by copyright, though recent work by Michigan indicates that 41% of the works are still protected by copyright.  If we assume 90% are public domain, then 1.7 million works are public domain.  If 59% are PD, then 1.1 million are PD.  Let's call it 1.5 million - and the number of in-copyright but out of print works should drop from 8 million to 6.5 million.]

So we are talking about 8 [6.5] million works published since 1923 that are in copyright but out of print.  Of those, how many are going to have inactive rights holders?  What percentage of authors are going to register with Google, and what percentage will ignore the call?  Or if we look another way - what percentage of these works are true orphans?

Denise Troll Covey's numbers might provide some guidance.  In CMU's random trial, she was unable to locate 21% of publishers.  (There random sample was not limited to out-of-print books, so the percentage might actually be too low.)  If we assume that number would hold on the 8 [6.5] million, that would mean that we have about 1.7 [1.4] million true orphan works in the total database of 13.5 [12] million.  (That number might actually be smaller since some rights holders other than publishers might come forward via the settlement.)  The remaining 11.8 [10.6 ] million books would either have rights holders who registered with Google or who choose not to register. 

Even with orphan works legislation, these works would not be eligible for inclusion in a digitized books database since they are not true orphans.  The Google Books settlement is the only way to get cost-effective access to them. 

What we need in the settlement is a compulsory license that would allow anyone to license the use of a work maintained by a non-active rights holders, and not just orphan works.

April 21, 2009

Google Book Settlement, orphan works, and foreign works

Much of the emerging criticism of the Google Book Settlement is focusing on the fate of "orphan works" in it.  Pamela Samuelson in her forthcoming ACM article on the settlement notes that "the settlement would, in effect, give Google the exclusive right to commercially exploit millions of orphan books."  The Internet Archive's request to be a defendant in the suit states that the proposed settlement "effectively limits the liability for the identified uses of orphan works of one party alone, Google Inc."  James Grimmelmann notes that "Google’s extraordinary market power under the settlement will come from its unique lock on orphan works."  But by focusing on orphan works, these critics downplay the impact on the real losers in the settlement: the thousands of foreign authors whose books can be exploited with impunity by Google and the Books Rights Registry.

A common assumption among the critics of the settlement is that there are two classes of rights holders.  The first are those who come forward and register with the Registry - what were called "active rightsholders" at the Columbia conference on the settlement.

The other class of rights holders are those who fail to opt-out of the settlement and also fail to register with the Registry.  The existence of these rightsholders is therefore unknown to either Google or the Registry.  It would be a mistake, however, to equate in-copyright but out-of-print works owned by these non-active rights holders with orphans. 

Orphan works are those works whose copyright owner cannot be located, either because the current owner cannot be identified or cannot be found.  There are many books that would be included in the Google settlement that are true orphans.  There are many more, however, whose copyright owners are eminently locatable - but who aren't participating in the settlement.

The largest group of non-active rights holders are likely to be foreign authors.  In spite of Google's efforts to publicize the settlement abroad, I suspect that most foreign rights owners of out-of-print books will fail to register with the Registry.  There are a couple of reasons for this.  For one, they may not know that their book is still protected by copyright in the US.  In addition, they may assume that international network of reproduction rights organizations would manage their royalties, and not understand the need to register separately. 

The scope of the foreign land grab could be considerable.  Some initial estimates suggest that 7 million books could be included in the settlement.  Of these it is estimated that 1 million are in the public domain.  That would leave 6 milllion in-copyright but out-of-print books.  Early efforts to try to understand the nature of the library collections that were being used to build the Google books database suggested that 50% of the works in the libraries were not in English, so it would be safe to say that at least 3 million of the books in the settlement will be foreign works.  (Since Google added many European partners after this study was done, the number is likely to be much higher.)  Some of these are going to be orphan works - but many more are going to have easily locatable rights holders that have chosen not to be active participants in the settlement.  Their royalties are destined for the pockets of the Registry.  I am willing to bet that a goodly percentage of the operating expenses of the Registry will come not from orphan works, but rather from foreign authors who do not understand the need to participate in the settlement.

Pamela Samuelson has questioned the representativeness of the defendants in the settlement.  If there is an injustice being done in the settlement, it is with foreign authors. 

April 09, 2009

The Google Book Settlement and the Public Domain

Shortly after the Google Book Settlement was announced, a colleague wrote to ask what I thought of its procedures for identifying public domain books.  I gave a quick response, but promised a more detailed analysis.  Here it is, months later. 

My quick assessment: the settlement specifies procedures that are likely to identify most public domain works published in the United States.  It is less helpful for foreign publications that may have entered the public domain; they are largely absent from the process.  Unfortunately, because this is part of litigation rather than legislation, no one else can take advantage of the results of the process - it moves us no closer to having a growing public domain.  What is unknown is to what extent Google will want to remove titles from the licensed products and make them freely available to the public.

Continue reading "The Google Book Settlement and the Public Domain" »

May 07, 2008

The Georgia State E-reserves lawsuit: Is Fair Use Dead? Or is it the traditional publishing model?

From Raizel:

Though Peter has just posted on the Georgia State lawsuit on fair use, the AAP’s risk, and end users, my post will focus more on the difficulty in defining the boundaries of fair use for libraries and institutions that are attempting to make owned works available online, the potential options libraries have at this point, and why moving towards open access might (eventually) help to solve this impasse.

Fair Use

According to the complaint, "Georgia State's general copyright primer ...affords "fair use" parameters -- that is guidelines as to allowable copying without permission -- that plainly exceed legal boundaries." And what are these so-called illegal policies (so-called because there is no statutory boundary of fair use)? Georgia State

"endorses up to twenty percent of a work -- a benchmark that would countenance unlicensed excerpts of dozens or even hundreds of pages from a given work."

As Peter points out, Georgia State’s percentage is based on “the state's guide to understanding copyright - developed by a committee of experienced lawyers and educators.”

While the percentage allowed under Georgia State's interpretation is larger than that of more conservative standards at other libraries, as Northwestern University copyright blog (Claire Stewart) states,

"It is interesting that the publishers [in this lawsuit] are not at all specific about their thresholds for acceptable use, leaving us to wonder whether they would consider any reserve use fair."

Some publishers do have stated guidelines for what they consider to be “fair use” for both e-reserves and use of materials in the classroom, often with stated percentages. However, from what I was able to easily discern (corrections are welcome!), the three publishers involved in this lawsuit do not have stated fair use guidelines readily accessible on their websites or elsewhere. (“Except as provided under national law, written permission is required to photocopy all Cambridge publications” and stated guidelines for reuse of materials by their own authors)

So what about using more restrictive guidelines than those at Georgia State, as done by other libraries, say not more than ten percent of a work? While that would allow for less use, it still doesn't address the elephant in the room -- that the library has already paid a great deal for the materials placed on either physical or electronic reserve. Either through purchase of usually a physical book or journal, or through licensing, libraries have paid, are paying, and will continue to pay publishers for the use of materials. Of course, ownership of physical copies does not allow for copying without fair use consideration – but the way libraries have been and continue to buy physical copies demonstrates their willingness to pay for copyrighted information.

In the case of licensing, often publishers tie the price of licensing a database to the enrollment numbers for the entire institution -- not just for those likely to be using the product, increasingly switching from a per-simultaneous user license to a full-time enrollment (FTE) license.  So academic libraries frequently have to choose to license a database to the entire institution rather than just for the small number of interested users – or do without. So it is probable that at least for some of the items in the complaint Georgia State has already licensed and paid for access to those items for more than those students who are using the e-reserves version of those articles.

One of the additional complicating issues revolves around who is responsible for creating and maintaining e-reserves. As my co-blogger Peter Hirtle stated in a Library Journal article last year,

"Since our AAP agreement, we don't even have separate e-reserve policies any more,” explains Peter Hirtle, intellectual property officer for the Cornell University Library. “We have electronic course content copyright guidelines.” That change, he says, reflects the recognition that “electronic course content” can reside in a system managed by the library, or in the Blackboard course management system, or on an entirely separate system maintained by the faculty, including personal web sites."

Effect on the market?

The fourth fair use factor is definitely in play here -- the effect of the use upon the potential market. As these three publishers see it, the market for academic materials used for classes is student classroom use, and therefore individual students should purchase any book or journal article that in any part is being used for a class. And as Peter points out in his blog post on this case, at present, the market is not seen as being impacted when individual students make copies of the same chapter in a book during the same semester or over years. From the perspective of the publishers, by hosting e-reserves, libraries are acting as the agents of individual students and professors and therefore the libraries should be required to purchase licenses for every single individual use of these materials. As Peter expresses “does fair use really disappear whenever there is a market that licenses reproductions?”

However, libraries view their “market” differently – to make materials accessible through being the consumers and collectors of academic materials, to be used in the short term and the long term, for the benefit of both individual users and for all possible users. This viewpoint allows libraries to serve both traditional library roles (purchasing physical items for subsequent theoretically unlimited loaning) and for newer library roles (licensing of materials for an entire institution or for a specific class). 

If the publishers’ view of the market wins, it will decimate much of library-based fair use.

Possible Options for libraries

So let's think about where this leaves libraries and institutions before there is a settlement or decision:

Option 1: Pay for everything posted online through the CCC or some other service. While the upside is that the institution won't likely get sued, the strong downside is paying again for items owned or licensed by the library -- and the abandonment of fair use.

Option 2: Have the entire institution carefully redo its entire policy on online posting of materials. Include information about percentages of material available to be posted (though that didn't save Georgia State) and how long materials will be posted (many libraries will only post materials for one semester/quarter based on fair use before asking for permission). 

Option 3: Only have links to online content licensed by the library and limit information beyond links in e-reserves to public domain materials. Require that all licenses either explicitly allow or do not forbid the use of links in course reserves. The upside is that no copyrighted content is copied, but once again, fair use loses out.

Option 4: And then there's the most difficult option for an academic library -- saying no to faculty members by not having any e-reserves at all. Dorothea Salo at Caveat Lector states that "If I were the Georgia State library, I'd play hardball. No e-reserves for anybody, and let faculty go whine at the AAP."

She previously posted in 2005 that libraries should say no to e-reserves until professors and authors understand the cost of publisher-directed "no fair use":

"Call out the AAP from behind the curtain, Look faculty in the eye and say, calmly, 'no, we can't put this on e-reserve, because fair-use is endangered everywhere and the AAP is making lawsuit noises-but why don't you and I contact the article authors and ask if they'll post a preprint we can link to? And by the way, are you posting your own preprints for others?" Salo said libraries must "draw a thick black line connecting what faculty do and what they have access to, because right now they don't see it." (via Open Access)

Open Access?

Option 5: The most difficult option of all is to change the dynamic between libraries, authors, and publishers. This more than the other options is a dramatic overall policy shift and therefore has little to do with copyright or fair use, but rather changing the question. Here the question is “How can professor authors help to make their work available for professors and students to use in the classroom and for scholarly work?”

One means of making more information accessible is through open access, defined by Peter Suber as "Putting peer-reviewed scientific and scholarly literature on the internet. Making it available free of charge and free of most copyright and licensing restrictions.” Harvard University and other institutions are moving in that direction. In an era where more and more information is seemingly available for free on the internet, placing peer-reviewed information where anyone, anywhere, with access to the Internet may read, download, copy, and distribute that article” seems like a great idea. But as Peter Suber states, open access is not a panacea; creating works still takes the efforts of authors and others.

Open Access leaves the beginnings of changing the dynamic in the hands of authors, but there is still a role for publishers (who can help with the selection, peer review, and editorial process), and libraries. Libraries can serve as institutional repositories for works written by their faculty and staff, help institution-based publishers, and will continue to purchase works.

As a great example of the possibilities of open access in practice, with a publisher working with an author and libraries, MIT Press has John Willinsky’s The Access Principle: The Case for Open Access to Research and Scholarship published in 2005, available for downloading, purchasing, and has a link for finding in a library. And at the time I’m writing this, the library copy nearest to me is checked out!

Open Access is not a cure-all, but thinking about access to e-reserves only as a copyright versus fair use issue obscures the present imbalance in the world of scholarly publication. The Duke Scholarly Communications blog puts this issue bluntly:

"The real irony is that [the lawsuit] is justified as an attempt to remedy a “free-rider” problem — the claim that universities are appropriating the work of publishers and authors without just compensation. This claim is patently absurd, given the amount of money university libraries invest in published resources, but it is downright offensive when the real issue is clarified. Publishers here are themselves the free-riders, obtaining a huge amount of academic content from the universities and their faculty without compensation. The GSU complaint cites as an irony the fact that one of the professors who is cited as infringing the copyright of Sage Publishing has himself published three articles in Sage journals. The gall of the man! Nowhere is it mentioned that he was required to give up those articles without payment for the privilege of publishing with a company that is now suing his employer to recover even more money for those freely donated articles." (emphasis added) 

The promise of moving into an Open Access system for publishing scholarly works will take years. However, fair use is designed to be flexible enough to fit many different types of uses, such as the idea of e-reserves. I hope fair use continues to be as flexible after this case either settles or is decided.

Cross-posted at the Copyright Advisory Network blog 

May 04, 2008

Georgia State: Don't Forget the User

From Peter:

I have held off posting anything about the Georgia State e-reserves case, figuring that others would say the most important things.  That has happened.  Claire Stewart has a wonderful post outlining the major issues and relevant literature, and she advances a strong argument why GSU's use could be a fair use.  I particularly like that she lists the schools of the authors listed in the complaint.  I wonder how many of them know - and would approve - that their work is being used to bring legal action against their colleagues at GSU who only admire their work enough to want to use it in class?  I think it is particularly ironic that at least two of the authors are at UC San Diego, which faced its own challenge from the AAP. 

Other postings of note: Kevin Smith explains why sovereign immunity is not an issue in the case, and Georgia Harper talks about the impact the case may have on open access. 

While these three postings provide all you really need to know about the case, there are lots of other issues that will emerge from it.  Let me tackle three: fair use, the risks the AAP faces, and most of all, the implications for everyone other than libraries.

Continue reading "Georgia State: Don't Forget the User" »

January 02, 2008

Defamation and Oral History: Update

Christmas came early in an important case in the California Supreme Court that this blog has been tracking.  In the case of Hebrew Academy of S.F. v. Goldman, the court reversed an earlier opinion concerning oral histories found in the Bancroft Library at UC-Berkeley.

Two issues were at play in the case.  The first concerned whether the admittedly limited distribution of copies of the oral histories in question was enough to activate California's "single publication" rule, which normally limits defamation actions to within the first year after initial publication.  The court ruled that the Bancroft did indeed "publish" the oral histories many years ago.  In addition, the court rejected the plaintiff's argument that the clock to file an action should only commence when  an alleged defamation is discovered.  Because the statute of limitations has expired, there is no need to determine whether the statements in the oral history were actually defamatory.

The decision is basically good news for libraries in California that might republish or distribute potentially defamatory material.  One year after publication, institutions can breathe easy, knowing that their risk has decreased.  This is the primary reasons that a coalition of library organizations and the Society of American Archivists filed an amicus brief in support of the defendants.  Furthermore, my employer (Cornell University) has included the decision in its defense of a million dollar defamation suit against it for digitizing a newspaper from the 1980s that allegedly contained defamatory material.   The new decision would seem to suggest that publication occurred in the 1980s, when the paper was first printed, and not in 2007, when the paper was scanned and added to Google.  The fact that the individual did not know at the time that the statements had been published is immaterial if the possibility exists that he or she could have learned of their existence.

On the flip side, the case does raise the possibility that for at least one year, oral history programs may have to take editorial responsibility for the statements that are included in oral histories.   Libraries with oral histories will have to take care that they do not unintentionally disseminate falsehoods even as they seek to present the unvarnished words of the interviewee.

November 14, 2007

Question to readers: Where do you go to find key copyright cases?

I'm looking to see if someone has put together a list of key copyright cases in the U.S.  Ideally, this list would be in a table or database so the user can sort by date, case name, court etc.   

Where do you go to find a list of key copyright cases (free, online sources only)?


p.s. besides the Stanford Fair Use & Copyright page, which we're presently updating.

October 11, 2007

Update: Court documents now available for RIAA case against Minnesota woman

Thanks, Nick!  Nick Moline at Justia just made the actual court documents in the RIAA trial available: http://news.justia.com/cases/239134/

August 22, 2007

Working at Justia today -- see what I found out!

I'm working at the Justia office today, and had a conversation worth sharing.

Conversation with Nick Moline, programmer at Justia

Moline:  We just added full text decisions, orders and opinions for federal district court cases if they've been filed at PACER.  This is in addition to the full documents that we post for featured cases.  We put a picture of a gavel <http://cases.justia.com/federal/district-court/>  next to the cases that have decisions in the files.

Minow:  Really!  Are all federal district court opinions available now?

Moline: The courts are supposed to post them, and most, but not all of the opinions are online. [Ed. note: According to PACER, this functionality "will only be available in courts that have installed District Court CM/ECF version 2.4 or higher, and will only provide free access to opinions filed after the court is actively using version 2.4"]

Minow:  And these are available for free?

Moline:  Yes, free.  They're already available free via PACER, but the general public wouldn't necessarily have access to the PACER database without registering.  Also, the PACER database is only searchable by date.  We cross-reference the decisions with our database of federal case filing that we've already fetched.  That means you can search by party name and case type. Soon we'll add more search capabilities.

Minow: Will we be able to search full text?

Moline: In some cases, yes.  Some of the documents are scanned, but others are converted from Word documents.  Most of the decisions are full-text searchable.  The lawyer submitted documents and filings are often still scanned documents at this point. You should see some of the handwritten docs from some of the pro se parties.  But most of the court decisions are full indexable. We use Google's Custom Search Business Edition, and I'm going to add a filter to focus on the cases to help folks search.   Google has already indexed about a thousand of the cases, and the results show up in a regular Google search.  For the scanned documents, Google does OCR the document to make it searchable if they can.

Minow: What dates do you cover?

Moline:  Today we have documents form 2006+.  Later this week we'll have the 2004 and 2005 cases. We will move back from there.

Minow: Does that mean I can go back and find cases with "library" as a keyword and find those cases.

Moline: Yes. If "library" is part of the party's name, you'll find the cases. If "library" is in one of the full text-searchable documents, you'll find those too - once Google has indexed them, which should be in a week or so.

Minow: When is the database updated?

Moline:  We run daily updates to update the database with new orders and opinions.

Minow: You know, I just had my own experience with that.  I subscribe to the cases that have been tagged "library" i.e.  http://news.justia.com/cases/library/.  On Aug. 20, I saw a decision had been issued in a library case that same day. I blogged it on Aug. 21, and got more hits on that post than any other in recent memory. Thanks for talking with us, and thanks for the great service!

August 21, 2007

Judge rules for library in patron lawsuit

A federal judge in Ohio ruled in favor of the Clark County Public Library on Aug. 20 with regard to the lawsuit by a patron who contested a ban on his use of the library for two years.

For great detail, see the court filings at Justia.

Also, Justia has updated court documents on other library cases there.  If you know of other pending federal district court cases of interest to libraries, let me know and I'll see if they can be added.