Is Google Making the Celera Mistake?

Celera was the company founded by Craig Venter, and funded by Perkin Elmer, which played a large part in sequencing the human genome and was hoping to make a massively profitable business out of selling subscriptions to genome databases. The business plan unravelled within a year or two of the publication of the first human genome. With hindsight, the opponents of Celera were right. Science is making and will make much greater progress with open data sets.

Here are some reaons for thinking that Google will be making the same sort of mistake as Celera if it pursues the business model outlined in its pending settlement with the AAP and the Author’s Guild:

  1. The task and the cost of curating the data cannot be separated from the responsibility and the expertise of those who generate it. Celera’s hope for massive private value in its private databases was undermined by the preference for publicly funded research to go its own sweet way into the public arena. Does Google really want to manage and control, assume the responsibility for all those who write books and how they can be distributed? Does Google and the Books Right Registry really think that Authors want their activities to be regulated in this fashion?
  2. Genomic databases are extraordinarily valuable, it does not follow that you can sell them as big ticket items. Is there a massive market out there for closed subscription databases to millions of books sold to institutions? Celera did make some sales of its promised proprietary databases, but it was never believable that there was available funding to support a market for billions of dollars per annum on genomic databases. Those chimerical numbers were needed to support the astronomical market cap Celera briefly touched. Google may not have such sky expectations of its digital library subscription revenues, but I wonder how well the expectations that it does have, match with the funding currently available to the public library system and educational institutions?
  3. PE was very good at building automated sequencing systems and selling them to researchers. Very, very good. It turned out to be not nearly so good at building a business to manage, curate and exploit genome databases that would be licensed to scientists and researchers. Such different activities do not mix, and your customers are likely to suspect a conflict of interest, and this is one reason why Celera was spun-out from Perkin Elmer. Google is very good, six times “very good” at managing search-sensitive advertising and large scale intentional databases drawn from web use. Are Google’s customers going to be happy working with a system in which their reading attention, and referential record is always being calibrated and used to influence their buying pattern and subscription budget?
  4. Hubris. Almost certainly in the case of Perkin Elmer, but they did have the sense to pull back. With Google it is hard to say….. hubris and ambition are sometimes confused, or mistaken, the one for the other.

There are plenty of differences between these two situations. Nor am I suggesting that all literary copyrights should be put into the public domain (nor indeed should all genomic data be treated as public). Differences and contrasts abound, but Eric Schmidt should put Sulston and Ferry’s book The Common Thread on his summer reading list.

The Twitter Book Club

At the weekend we were startled to find that Jonathan Ross had a stunningly good idea. An idea that should have occurred to us sooner. Wossy, who twitters a lot, thought about starting a book club and one of the books he mentioned for his wheeze was the Bloomsbury book, Kate Summerscale’s, The Suspicions of Mr Whicher. This mention alerted us because that is one of the titles in the Bloomsbury Library offering that uses the Exact Editions platform. One of my colleagues either follows wossy, or has a Google Alert out for Mr Whicher: not completely sure which, er….

If you do not live in the UK, you may not know too much about Jonathan Ross. But you can tell he has quite a hold on his audience when you see that his crowd of Twitter followers exceeds 250K. How then should a Twitter book club work? Well it could simply be a stream of tweets about a particular book and wossy has pushed off with one already. The Men Who Start at Goats. Follow the conversation here.

A good idea which could be even better. Here are our suggestions for how this could evolve into a commercial proposition for someone:

  1. A celebrity chooses the book and each book gets a week of attention. Wossy has done this bit
  2. Wossy needs to persuade the publisher (and the author or agent, if they need persuading) that this is a good idea and put the book on Open Access for a week whilst the Twitter stream goes to full volume. While this happens the book will get a lot of attention. In the shops. The Open Access platform might be a streaming solution such as Exact Editions runs, so the taps can be switched on at the start of the week and off at the end. (If any agent thinks that putting a book on open access for a week is going to exhaust the market for his book, he needs to find another profession, or another author).
  3. Not to beat about the bush (pulling the light from out of the bushel), there is another big advantage of using the Exact Editions platform in a promotional event such as this: every page in the books can be the subject of a direct link. Tweets can cite the books as they appear in the discussion. This type of public conversation really needs a method of targetting specific pages. Especially since Twitter is not going to have the space to allow real hand-crafted, cut and pasted, quotations. (Light goes back behind bushel, muttering that any distribution system for this idea, also has to handle the e-commerce).
  4. After the week of tweeting and general discussion, the Open Access finishes but the printed book can of course be acquired through the bookshops, or a digital subscription to it taken out through the digital platform.
  5. At this point some costs have been incurred and a slice of revenue would be earmarked by the distribution and e-commerce platform (bushel smiles). For the sake of argument a Scribd type of percentage might be enough.
  6. That still leaves the majority of the revenue from this exercise which clearly goes to the publisher and the author.

Why are we broadcasting this idea in public, rather than gently sidling up to Wossy, or Stephen Fry or Oprah or whoever, with an NDA in our fists, and persuading them to do it? Mostly because NDA’s seem such an untwitterish way to think about it!

Perhaps someone has a better idea. At any rate there is no copyright in ideas, and not much of a copyright in twitter streams: so if there is a better idea about the twitter book club it has our blessing. Meanwhile, if anyone wants to bounce the idea back at us, we look forward to hatching plans with publishers, agents, Oprah, Stephen Fry, the Real Shaq, or whoever. Let’s see it working…. Twitter is good for books.

Google Books Search: What is Good for Google is Good for the USA

There was an important Conference on the Legal and Publishing impact of the Google Books Settlement at Columbia Law School on Friday. Several attendees, led by Peter Brantley, were actively Twittering the event, see #gbslaw for the Twitter-stream. There are one, two very useful reflective summaries posted by Peter Hirtle (lawyer at Cornell Library).

Apparently one of the recurring themes in the conference was this mantra “What is good for Google is good for the USA.” I am sure that it was said in jest/irony, but that must nevertheless have made the Google participants unhappy. Even if ironic, the comparison is wounding. Just now being compared to General Motors is nearly as bad as being compared to AIG, and is frankly worse than being compared to Microsoft (which would also be very unfair and unwelcome to Google, but the comparisons are coming…). The mantra is especially unfortunate, since it is far too close to the bone: the whole way the Google Book Search settlement is working out is far too US-centric, as though Detroit was the market, and the accessibility of digital books in the rest of the world was not a matter of importance to the US or to Google. General Motors has been building inefficient and slipshod cars which had limited appeal in the rest of the world and failed the ultimate tests of quality engineering and sustainability. Could Google fall into a similar trap of building too much, too wastefully, for local demand and national circumstance without full attention to all the factors which build quality, openness and sustainability? Apparently some anxieties on this score were raised at the meeting. Somewhere in the Twittering I saw someone questioning how the US would feel if another country adopted a similar approach a private enclosure and database representation of all the books in the English language held by French libraries (the French or even more probably the Chinese Union Database Library? It will probably happen). Can you imagine the uproar? Senator Conyers would have most unfavoured nation legislation in train within a twinkling…

A lot of the books from these dusty stacks in Michigan and California are foreign published. Through the group of libraries in the US with which it is collaborating Google will catch in its net of NotYet OutOfCopyright but OutOfPrint titles a vast swathe of books originally published by British, French and German publishers. Google has apparently spent $7 million in the last two month on press advertisements in over a hundred countries to advise authors and publishers of the rights that they may have in the Settlement to the use of their books in the US market ($7 million on print ads for the legal notice, few text database projects have had a total investment this large). But the authors of those books are also readers and if the eventual legal and technological effect of the Settlement is to make the access to those books much less viable in the countries in which they were written or published?

Spare a thought for Google: not only it is it being compared to General Motors, they now also have to deliver on the very substantial obligations which the Settlement imposes on them, in particular to roll out commercial services to libraries and to individuals (to reiterate: these obligations are only to deliver services to the US market). This is going to keep Google very busy. Many critics of the Settlement have pointed out that it creates an enormous (millions of books) private preserve for Google, from books which look more like they belong to the public domain, either because they are orphan, or because they close to orphan. This monopolistic position is seen as an obstacle to competition. Of course it is in one way a matter of enormous advantage for Google.

But there is another way of looking at the situation. Google is now under the obligation, the heavy public expectation of delivering services from this massive collection. I believe that it will be under a very heavy public expectation and moral obligation to deliver, or find some legal way to enable, similar services to overseas markets. Google has assumed an onerous obligation to curate and deliver services for a large class of legacy titles. Inevitably it has been taking short-cuts, there is a weird absence of metadata, it has missed some quality goals, the books are not always exciting, many of them are out of print for good reason, I suspect that the difficulty and the importance of this legacy task will in itself make it impractical for Google to be the innovator in the book space that it might like to become. It is much easier to deliver an innovative and truly revolutionary social service for book readers when you are not curating 10 million titles. Hirtle concludes his excellent notes with this:

Yet while there may be great disappointment with the process used to generate the settlement, I also detected no incipient revolution against the settlement itself. No one was calling for rights holders to register and submit comments to the court (as they can do until 5 May). No one was saying the court should reject it and tell the parties to start over. Yes, the class may be too large and the mechanism too crude, but we created this problem when we abandoned formalities, lengthened copyrights, and started treating every copyrighted item in the world like it was a Disney movie. Given this procrustean bed we have made for ourselves, the settlement may be our only way out. Yes, Congress should create a compulsory license authorizing the use of out-of-print books – but don’t hold your breadth waiting for that. In the interim, the settlement may be the best we can hope for – even though it has the potential to radically alter all of our worlds. (Hirtle: Library Law Blog)

Google will proabably get its way, for the most part, with the Settlement, but it may also find the bed it has made for itself, with the aid of Publishers and the Author’s Guild, somewhat procrustean. The tasks it faces are Herculean. It will surely get a lot of attention from lawyers (within and without the business). There will be worries about monoploy and anti-trust but there will be plenty of competition.

Google Book Search Mobile

Yesterday Google announced a mobile implementation of their Book Search service: Google Book Search Mobile.

There are 1.5 million books available for complete reading on your mobile phone (iPhone or Android recommended), but less than half of them will be available outside the USA, for copyright reasons previously discussed on this blog. In an email forum a Google engineer estimated that 620,000 are readable ex-USA. 40% of a huge library is even so a very large library. There are some wonderful gems in the collection (Mark Twain, Florence Nightingale, Charles Dickens and lots of wonderful Victorian stuff). But I think that they are all really old books — browsing these stacks is a bit like sniffing around a very dusty and arcane book depository from about 1898. Some of these tomes have not been touched for decades.

If you like reading on your iPhone you now have a wonderful library of treasures to explore. You will need a battery booster.

There are some surprising features of the implementation. First, the system works by piping an ASCII version of the text on to your screen, so that it can ‘reflow’ to fit the dimensions of your screen. For some of these old books the ASCII version that Google infers from OCR is poor, in cases unusable. Google will have to improve it (we can be sure that some engineers there are already relishing the challenge — improving the quality of the ASCII is a key requirement for the other things which can grow from GBS). Second, Google offers a ‘version’ of the printed page which you can tap through to, if you want to check up on the doubtful ASCII. This is not the full page of the book, but a section of the page or column of print synthesised for display. So the text appears in the original typeface and linebreaks but without the full page detail, without the original linespacing. Nor can the images be expanded in the usual iPhone style. This strikes me as an odd and complicated compromise. I wonder whether Google is paying too much attention to the temporary limitations of today’s screens. Will there be another, richer, enlargeable, photo-realistic layer for the A4 mobiles that will surely appear next year?

It will be interesting to see how enjoyable this platform becomes as a reading environment. It is certainly great for browsing and for searching. Will consumers expect this open and free library to become the foundation of their individual digital libraries? One guesses that this is the Google intention.

Google Book Search and the Tragedy of the Anti-Commons

Michael Heller, a property lawyer at Columbia University, has coined the term the ‘tragedy of the anti-commons’. This is a twist on the more familiar idea of ‘the tragedy of the commons’ — which is thought to be the cause of such ecological disasters as the implosion of fisheries, perhaps even the nearing apocalypse of global heating. Heller’s insight is that too much private ownership can be as much of a problem as too little: “When too many owners control a single resource, cooperation breaks down, wealth disappears and everybody loses.” He gives plenty of examples in his book The Gridlock Economy — the book’s argument is forcibly stated in its subtitle: How Too Much Ownership Wrecks Markets, Stops Innovation, and Costs Lives.

There is a good chance that the Google Books Settlement is going to show us all how this tragedy of the anti-commons works out in the world of books. The Google project, which is backed by the American publishers and American Authors’s representatives should be (in my view will be) a wonderful resource for American universities, schools, public libraries and through them for American consumers. By 2011, if the Settlement is approved, at least 5 million out of print but not yet out of copyright [OOPnotYOOC] titles will be available to readers in the US market. This resource will have little opportunity to work so well for authors, readers and consumers in the rest of the world. The books will by and large not be available in the rest of the world (perhaps in American embassies?).

Google is already serving a very different and vastly narrower view of Google Book Search to the rest of the world (even to Canada and Mexico). Books which are public domain and wholly visible and readable in the US are not visible and readable elsewhere. And this copyright caution about territorial rights is unlikely to change, because the Settlement, when it is approved, is only going to be approved and agreed for the US market. Google has been persuaded (or has volunteered?) to accept the territorial restrictions and complications inherent in the market of copyright books. In my view, Google will not risk starting court actions in other jurisdictions, for the very simple reason that they might be lost, or worse still settled on a different basis from the US dispute. Google will be bound to leave the ex-US position of its wonderful aggregate of unloved (mostly ‘orphan’) copyrights in a national limbo. The orphans will remain unloved outside the 50 states.

The complexity of the rights situations of these millions of titles is effectively unmanageable and un-negotiable, which is pretty much what Michael Heller means by a tragedy of the anti-commons. By developing and growing an intricate and incredibly complex system of rights for different legal regimes and market territories the publishing industry has produced a system where a negotiated and innovative new service is probably impossible. It would take something like a new Berne convention on copyright to make this a level plane for all jurisdictions.

One might say that this hopeless and impenetrable thicket of rights which are largely historical and dormant is a problem for the rest of the world and for scholars outside the US. It is not a problem for the US, or for Google. Well maybe….. but it is also possible that this lack of international and global relevance will undermine the authority and the prestige of a US-centric resource. I wonder whether US scholars will accept a situation in which citations and references cannot be made and verified in a global context?

There is another dimension in which the impenetrable complexity of the rights position OOPnotYOOC titles: illustrations and photographs in these titles are in effect excluded from the scope of active exploitation by Google. Interestingly enough, Children’s Book Illustrations are to be treated differently. They are defined as ‘inserts’ and therefore fall within the scope of the settlement and will presumably be in the searchable and readable services that Google produces. But, in the place of ordinary illustrations and photographs in books which are not ‘Children’s Books’ we should expect gaps or blanks, such as one already finds in the Google Book Search service. Eric Rumsey thinks that I may be on my own in reading the Google Settlement this way, but some apparently well-informed, anonymous, commenter makes a similar point in a comment on the Martyn Daniels blog. Why should illustrations in Children’s Books be treated differently from those in other books? I suspect that the publishers and the Authors Guild felt that they could negotiate with certainty on these rights (as also on quotation rights, rights in poetry etc) but they knew that they could not negotatiate for the owners of artistic rights.

Will it matter that Google Book Search, when it is marketed as a commercial subscription service for libraries and universities cannot be accessed or read in the world at large? Will it matter that many of the photographs and illustrations in millions of the OOPnotYOOC titles will not be there? Yes, it will matter, and that it matters will be another instance of the tragedy of the anti-commons.