Is Google Making the Celera Mistake?

Celera was the company founded by Craig Venter, and funded by Perkin Elmer, which played a large part in sequencing the human genome and was hoping to make a massively profitable business out of selling subscriptions to genome databases. The business plan unravelled within a year or two of the publication of the first human genome. With hindsight, the opponents of Celera were right. Science is making and will make much greater progress with open data sets.

Here are some reaons for thinking that Google will be making the same sort of mistake as Celera if it pursues the business model outlined in its pending settlement with the AAP and the Author’s Guild:

  1. The task and the cost of curating the data cannot be separated from the responsibility and the expertise of those who generate it. Celera’s hope for massive private value in its private databases was undermined by the preference for publicly funded research to go its own sweet way into the public arena. Does Google really want to manage and control, assume the responsibility for all those who write books and how they can be distributed? Does Google and the Books Right Registry really think that Authors want their activities to be regulated in this fashion?
  2. Genomic databases are extraordinarily valuable, it does not follow that you can sell them as big ticket items. Is there a massive market out there for closed subscription databases to millions of books sold to institutions? Celera did make some sales of its promised proprietary databases, but it was never believable that there was available funding to support a market for billions of dollars per annum on genomic databases. Those chimerical numbers were needed to support the astronomical market cap Celera briefly touched. Google may not have such sky expectations of its digital library subscription revenues, but I wonder how well the expectations that it does have, match with the funding currently available to the public library system and educational institutions?
  3. PE was very good at building automated sequencing systems and selling them to researchers. Very, very good. It turned out to be not nearly so good at building a business to manage, curate and exploit genome databases that would be licensed to scientists and researchers. Such different activities do not mix, and your customers are likely to suspect a conflict of interest, and this is one reason why Celera was spun-out from Perkin Elmer. Google is very good, six times “very good” at managing search-sensitive advertising and large scale intentional databases drawn from web use. Are Google’s customers going to be happy working with a system in which their reading attention, and referential record is always being calibrated and used to influence their buying pattern and subscription budget?
  4. Hubris. Almost certainly in the case of Perkin Elmer, but they did have the sense to pull back. With Google it is hard to say….. hubris and ambition are sometimes confused, or mistaken, the one for the other.

There are plenty of differences between these two situations. Nor am I suggesting that all literary copyrights should be put into the public domain (nor indeed should all genomic data be treated as public). Differences and contrasts abound, but Eric Schmidt should put Sulston and Ferry’s book The Common Thread on his summer reading list.

Advertisements

A Democratic Quality to Digitization

Robert Darnton used this interesting phrase in his recent NPR comments on the Google Book Search project and the Settlement (a 7 minute interview here).

There is a democratic quality to digitization but if those supplying it are simply trying to maximise profits the whole thing could turn sour. (The Infinite Shelf .. On the Media, 27 March 2009)

Is it true that there is a democratic quality to digitization? I think there may be a profound truth there, and getting at it, may do something to reduce or quieten Darnton’s worries. He is right to be worried. If Google were to become the predominant and monopolistic supplier of books (and other print, digital print, resources) through the web, that would be a disaster. But that is a big if because digitization of our print heritage is a broadly democractic shift. It is a democractic shift in much the same way as the invention and adoption of print led to, or was one of the necessary preconditions of the democratic thrust of the Enlightenment (see Darnton’s original post on Google & the Future of Books). Darnton rightly points out that the democracy of the Enlightenment was partial and restricted in its reach by privilege

Far from functioning like an egalitarian agora, the Republic of Letters suffered from the same disease that ate through all societies in the eighteenth century: privilege. Privileges were not limited to aristocrats. In France, they applied to everything in the world of letters, including printing and the book trade, which were dominated by exclusive guilds, and the books themselves, which could not appear legally without a royal privilege and a censor’s approbation, printed in full in their text. (NYRB 12 February 2009)

The web is putting the final nail in the coffin which restricted the privileges of print (bolstered by the legal privilege of copyright), initially to men (rather than women), the rich and then the wealthy, the formally educated and which even now in our own time excludes some. The democratization implicit in digitization works in two ways. It works for the universality and openness of distribution because it is now a fact that digital copies and digital access are available at marginal cost for everyone. A lot more stuff will be free, partly because advertising which accompanies or supports it can generate profits, but also because it really is dirt cheap to provide free web access. So cheap that to anyone who provides digital services, providing some services free, some access to content for free, is a no-brainer. Digital access is strikingly open and democractic in its thrust because it actually (and obviously) costs more to exclude someone or anyone from access to a web resource than to enable it for everyone. ‘Open’ is simply, for the supplier, the lowest cost access model on the web. Authenticating, selling, registering for or targetting access costs more. But the democractic bias of digitization works also at the point of creating digital resources. It is much easier to create and if necessary re-create digital resources than to look after them in any other way.

Moving from the democratic thrust of free access from digitization, a digital process is like a printer’s press in that it enables us to originate digital masters. Digitization as a method of data capture, a means for transforming cultural objects to web presence, is also becoming more feasible and more necessary. Digitization as a process is democratic because it is repeatable and reliable and affordable. Digitization is also likely to be of higher quality if it is various and competitive (Google’s problems with quality of capture are notorious). Digitization as a transformative process, relying on software, computers and scanning instruments, is becoming easier and cheaper at something close to Moore’s law. Even Google’s massive digitization project is now much easier and cheaper than it was when they started. Since digitizing books (films, works of art, music etc) is becoming more affordable and easier every year we should have more of it. We will probably soon have consumer-targetted, hand-held, intelligent scanners.

The real danger in the Google Book Search service and the Settlement is that libraries and publishers should start to think that digitization is best left to the uniquely specialised Google. To prevent a monopoly we need a choice of services which digitize books and print resources and serve them openly (or as commercial services) to audiences through the web.

I think Darnton is right, there is a democratic thrust to digitization and it is in all our interests that there should be lots of alternatives to the digitization engine that Google has created with the help of the New York Public library, the Oxford, Harvard, Michigan and Stanford University Libraries (of course many more univerisities are now in the Google ship). Surely, the Google Books Library, (for it is rapidly becoming that), needs to be watched so that it does not become an engine for monopoly pricing, but the best safeguard against this is to create and sustain alternatives. Having played a part in kicking off the Google initiative, Harvard can help the next and better proposition that comes along. Darnton as Harvard’s librarian should be there to support it.

Google Books Search: What is Good for Google is Good for the USA

There was an important Conference on the Legal and Publishing impact of the Google Books Settlement at Columbia Law School on Friday. Several attendees, led by Peter Brantley, were actively Twittering the event, see #gbslaw for the Twitter-stream. There are one, two very useful reflective summaries posted by Peter Hirtle (lawyer at Cornell Library).

Apparently one of the recurring themes in the conference was this mantra “What is good for Google is good for the USA.” I am sure that it was said in jest/irony, but that must nevertheless have made the Google participants unhappy. Even if ironic, the comparison is wounding. Just now being compared to General Motors is nearly as bad as being compared to AIG, and is frankly worse than being compared to Microsoft (which would also be very unfair and unwelcome to Google, but the comparisons are coming…). The mantra is especially unfortunate, since it is far too close to the bone: the whole way the Google Book Search settlement is working out is far too US-centric, as though Detroit was the market, and the accessibility of digital books in the rest of the world was not a matter of importance to the US or to Google. General Motors has been building inefficient and slipshod cars which had limited appeal in the rest of the world and failed the ultimate tests of quality engineering and sustainability. Could Google fall into a similar trap of building too much, too wastefully, for local demand and national circumstance without full attention to all the factors which build quality, openness and sustainability? Apparently some anxieties on this score were raised at the meeting. Somewhere in the Twittering I saw someone questioning how the US would feel if another country adopted a similar approach a private enclosure and database representation of all the books in the English language held by French libraries (the French or even more probably the Chinese Union Database Library? It will probably happen). Can you imagine the uproar? Senator Conyers would have most unfavoured nation legislation in train within a twinkling…

A lot of the books from these dusty stacks in Michigan and California are foreign published. Through the group of libraries in the US with which it is collaborating Google will catch in its net of NotYet OutOfCopyright but OutOfPrint titles a vast swathe of books originally published by British, French and German publishers. Google has apparently spent $7 million in the last two month on press advertisements in over a hundred countries to advise authors and publishers of the rights that they may have in the Settlement to the use of their books in the US market ($7 million on print ads for the legal notice, few text database projects have had a total investment this large). But the authors of those books are also readers and if the eventual legal and technological effect of the Settlement is to make the access to those books much less viable in the countries in which they were written or published?

Spare a thought for Google: not only it is it being compared to General Motors, they now also have to deliver on the very substantial obligations which the Settlement imposes on them, in particular to roll out commercial services to libraries and to individuals (to reiterate: these obligations are only to deliver services to the US market). This is going to keep Google very busy. Many critics of the Settlement have pointed out that it creates an enormous (millions of books) private preserve for Google, from books which look more like they belong to the public domain, either because they are orphan, or because they close to orphan. This monopolistic position is seen as an obstacle to competition. Of course it is in one way a matter of enormous advantage for Google.

But there is another way of looking at the situation. Google is now under the obligation, the heavy public expectation of delivering services from this massive collection. I believe that it will be under a very heavy public expectation and moral obligation to deliver, or find some legal way to enable, similar services to overseas markets. Google has assumed an onerous obligation to curate and deliver services for a large class of legacy titles. Inevitably it has been taking short-cuts, there is a weird absence of metadata, it has missed some quality goals, the books are not always exciting, many of them are out of print for good reason, I suspect that the difficulty and the importance of this legacy task will in itself make it impractical for Google to be the innovator in the book space that it might like to become. It is much easier to deliver an innovative and truly revolutionary social service for book readers when you are not curating 10 million titles. Hirtle concludes his excellent notes with this:

Yet while there may be great disappointment with the process used to generate the settlement, I also detected no incipient revolution against the settlement itself. No one was calling for rights holders to register and submit comments to the court (as they can do until 5 May). No one was saying the court should reject it and tell the parties to start over. Yes, the class may be too large and the mechanism too crude, but we created this problem when we abandoned formalities, lengthened copyrights, and started treating every copyrighted item in the world like it was a Disney movie. Given this procrustean bed we have made for ourselves, the settlement may be our only way out. Yes, Congress should create a compulsory license authorizing the use of out-of-print books – but don’t hold your breadth waiting for that. In the interim, the settlement may be the best we can hope for – even though it has the potential to radically alter all of our worlds. (Hirtle: Library Law Blog)

Google will proabably get its way, for the most part, with the Settlement, but it may also find the bed it has made for itself, with the aid of Publishers and the Author’s Guild, somewhat procrustean. The tasks it faces are Herculean. It will surely get a lot of attention from lawyers (within and without the business). There will be worries about monoploy and anti-trust but there will be plenty of competition.

Google Book Search and the Tragedy of the Anti-Commons

Michael Heller, a property lawyer at Columbia University, has coined the term the ‘tragedy of the anti-commons’. This is a twist on the more familiar idea of ‘the tragedy of the commons’ — which is thought to be the cause of such ecological disasters as the implosion of fisheries, perhaps even the nearing apocalypse of global heating. Heller’s insight is that too much private ownership can be as much of a problem as too little: “When too many owners control a single resource, cooperation breaks down, wealth disappears and everybody loses.” He gives plenty of examples in his book The Gridlock Economy — the book’s argument is forcibly stated in its subtitle: How Too Much Ownership Wrecks Markets, Stops Innovation, and Costs Lives.

There is a good chance that the Google Books Settlement is going to show us all how this tragedy of the anti-commons works out in the world of books. The Google project, which is backed by the American publishers and American Authors’s representatives should be (in my view will be) a wonderful resource for American universities, schools, public libraries and through them for American consumers. By 2011, if the Settlement is approved, at least 5 million out of print but not yet out of copyright [OOPnotYOOC] titles will be available to readers in the US market. This resource will have little opportunity to work so well for authors, readers and consumers in the rest of the world. The books will by and large not be available in the rest of the world (perhaps in American embassies?).

Google is already serving a very different and vastly narrower view of Google Book Search to the rest of the world (even to Canada and Mexico). Books which are public domain and wholly visible and readable in the US are not visible and readable elsewhere. And this copyright caution about territorial rights is unlikely to change, because the Settlement, when it is approved, is only going to be approved and agreed for the US market. Google has been persuaded (or has volunteered?) to accept the territorial restrictions and complications inherent in the market of copyright books. In my view, Google will not risk starting court actions in other jurisdictions, for the very simple reason that they might be lost, or worse still settled on a different basis from the US dispute. Google will be bound to leave the ex-US position of its wonderful aggregate of unloved (mostly ‘orphan’) copyrights in a national limbo. The orphans will remain unloved outside the 50 states.

The complexity of the rights situations of these millions of titles is effectively unmanageable and un-negotiable, which is pretty much what Michael Heller means by a tragedy of the anti-commons. By developing and growing an intricate and incredibly complex system of rights for different legal regimes and market territories the publishing industry has produced a system where a negotiated and innovative new service is probably impossible. It would take something like a new Berne convention on copyright to make this a level plane for all jurisdictions.

One might say that this hopeless and impenetrable thicket of rights which are largely historical and dormant is a problem for the rest of the world and for scholars outside the US. It is not a problem for the US, or for Google. Well maybe….. but it is also possible that this lack of international and global relevance will undermine the authority and the prestige of a US-centric resource. I wonder whether US scholars will accept a situation in which citations and references cannot be made and verified in a global context?

There is another dimension in which the impenetrable complexity of the rights position OOPnotYOOC titles: illustrations and photographs in these titles are in effect excluded from the scope of active exploitation by Google. Interestingly enough, Children’s Book Illustrations are to be treated differently. They are defined as ‘inserts’ and therefore fall within the scope of the settlement and will presumably be in the searchable and readable services that Google produces. But, in the place of ordinary illustrations and photographs in books which are not ‘Children’s Books’ we should expect gaps or blanks, such as one already finds in the Google Book Search service. Eric Rumsey thinks that I may be on my own in reading the Google Settlement this way, but some apparently well-informed, anonymous, commenter makes a similar point in a comment on the Martyn Daniels blog. Why should illustrations in Children’s Books be treated differently from those in other books? I suspect that the publishers and the Authors Guild felt that they could negotiate with certainty on these rights (as also on quotation rights, rights in poetry etc) but they knew that they could not negotatiate for the owners of artistic rights.

Will it matter that Google Book Search, when it is marketed as a commercial subscription service for libraries and universities cannot be accessed or read in the world at large? Will it matter that many of the photographs and illustrations in millions of the OOPnotYOOC titles will not be there? Yes, it will matter, and that it matters will be another instance of the tragedy of the anti-commons.

Copyright Czar and Copyright U-turn

The outgoing US President has signed a bill which creates a US Copyright Czar. See the PC Magazine report.

Bush signed the Prioritizing Resources and Organization for Intellectual Property (PRO-IP) Act, a measure that will create several new government enforcement positions.

If it turns out that it is President Obama who is in fact charged with appointing this Czar, there must be a small chance that Laurance Lessig will fill the post. That would be an ironic turn of events.

As a copyright loyalist, one has to recognise that it is often the supposed advocates and defenders of copyright who are its worst enemies. Copyright would be stronger if it were more permissive and less onerous. Witness the ludicrous decision of the German courts who have decided that Google is infringing copyrights when it includes thumbnail images in search results. Whatever the technicalities of the German law on this point, we can be certain that incredibly useful services such as Google image search will not be derailed. At a certain point technology simply plows on and works its way around obstacles of this kind.