Google Book Settlement — ReDux

I go away for two days in the mountains and come back to find the Google Books Settlement II, a 173 page renegotiated version of the original deal (red-lined). Catch up with Grimmelmann.

I am not sure that I will read the new contract; getting through the first version was quite a strain! Here is one view of why it matters and why it doesnt.

The Google Book Settlement matters because something pretty much like what is envisaged in the Settlement is going to happen. And that is a good thing. Google has 10 million books scanned and databased in their servers. They are revved up and no doubt waiting to go. A large part of the most valuable human knowledge of the twentieth century will be accessible and will be being digitally read in American universities from some time in 2010 (that is a very good thing; a very bad thing is that they will NOT be available in the rest of the world, and the legal technicians have not much clue about how that broader accessibility can happen). Even if the judge were to reject the Settlement, even if the DOJ were to file and insist upon some swingeing limitations to the scope of the agreement, most books will be Googled from now on. 10 million books have already been databased in the way pioneered by the Google Books system, many of them at the express request of their publishers. Some aspects of the Google project have been very controversial, which is why we have a court case and why the legal hostilities may meander on for years yet. The outcome is predictably messy but the change has occurred.

The publishing paradigm has shifted and most books will now be accessible and searchable in various ways via Google (and perhaps, let us hope, via alternative search engines). Five years ago it was by no means obvious that all books would be digitised en masse, in their entirety (even many of the bindings), full text searchable, that they would be page-rendered, that they would be straightforwardly citeable in something like the ways print scholars have cited books for centuries (volume, chapter, page), that they would be readable in much the same way as ordinary web pages, that illustrations and indexes should be in place (though for many of the 20th Century books this will not be the case — as a direct result of the Settlement orphan illustrations are more orphan than the texts). None of this was settled in 2004.

So there has been a revolutionary shift in the publishing paradigm. But now for the other shoe: I am not confident that the Google ‘victory’ in the case of the Settlement, will be seen that way in the future. Thomas Kuhn who coined the usage of ‘paradigm shift’ to explain the way in which with a scientific revolution a period of upheaval with its paradigm shift, then led to a period of ‘normal science’ when investigations proceeded under the shelter of the new paradigm. I am not sure that Google will now find itself in a period of normal science or calmer waters. Google has made a tremendous step forward, with its vision of the comprehensive digital database of published books, but the suspicion is growing that this is not a terminus. The Googled library/bookshop of all published literature is not a finished product and it is highly probable that it will fairly soon be overtaken by other models and by other paradigm shifting changes in the technology. I have the very strong intuition (it is merely that and I can not prove it) that digital books will soon be used in ways that surprise us greatly and have very little in common with the current operation of the Google Books service. We have barely started on the path of understanding how digital books can be used; and as an early entrant to the field, Google has every chance of finding itself out-innovated. Some Twitter-type of disruptive service will doubtless come along soon to show us how computation really should work with digital editions.

Google will surely be making the incumbent’s mistake if they suppose that the Settlement really settles, solves or finalizes the direction that digital book technology should now take. To quote the Scottish sage “I hae me doots”.

Wherefore Art Thou Roneo-ed?

A friendly journalist referred the other day to one of the Exact Editions Apps as a ‘photostat’ of the magazine. One of my colleagues noted that this was a word she hadn’t heard for 20 years, and my wife’s cryptic observation was: “At least he didn’t call it a roneo-ed version”. I was wondering whether this comment was a curious compliment. Have you noticed the way in which ‘vinyl’ has become a term for musical quality and tonal fidelity? Perhaps the roneo-ed Digital Magazine is the acme of approval in an app, and the photostat is just a step on the way to wax cylinder bliss?

Perhaps. But I think not. Exact Editions is so named because we believe that digital editions of magazines (books, newspapers, bibles, periodicals, catalogues, compendia, scores, and print objects in general) should be faithful representations of the digitised pages. But of course a digital edition has to be much more than a simple reproduction. To see the result as a photostat or a mere copy is to miss more than half the point. Here is a non-exhaustive list of the other features one should expect in a digital edition:

  • Every page in the digital edition should be citeable (which means that any other web page, or web service can make a targeted reference within the publication to a specific page on which some relevant content resides).
  • Pages within a digital edition should have links out. Which is to say that a digital edition should be just as capable of being the source of a citation as the target for one.
  • Putting these two points together (citing, and being cited) in effect means that every page in a digital edition will be a web page, a part of the web. Far too many of the ‘e-magazine’ solutions that we see in the marketplace fail in this basic requirement. Many publishers have been using digital platforms which allow them to distribute blobs of content through the internet, but the publications are not proper web resources.
  • This fact about the pages or parts of digital editions being themselves a part of the web also has the consequence that a proper digital edition should be readable/usable by standard web browsers and it should be accessible to standard web operations (eg crawlers, counters, mashups and tweeters)
  • Every page in a digital edition with text on it should ideally be searchable by a search engine, by Google or Bing (should be capable of being searched by Google — though a publisher may decide to withhold content from Google/Bing searching).
  • Every page in a digital edition should be searchable by the edition itself. That is a digital edition should have a mechanism whereby the search for a term can be restricted to the publication itself.
  • Digital editions should also have their own appropriate internal navigation (eg live links from Tables of Contents, Indexes of Advertisers).
  • Digital editions should offer their users and readers immediate links to relevant web resources (eg live links to web resources mentioned in the text).

There are other things one could add to this list, but it is long enough to make the point. Digital editions are much, much, more than roneo-ed editions. I am fairly sure that the early users of the Spectator App which was released just over a week ago, have not yet realised how much coiled web energy there is inside the App. There are, by my back of the envelope estimates, over 200 issues, well over 10,000 digital pages, over 5,000 live phone number links, perhaps 20,000 internal links within the magazine issues and well over 20,000 external links from the issues, all of this is accessible from the App. But the App is not really a product at all, it is a subscription service to a database which manages the magazine for its subscribers. Publishers are all now in the service industry, and digital editions are the service that they should be offering their readers.

How Will We Read? Google or Kindle

Yesterday, Amazon announced the Kindle 2 at a Press Conference at the Pierpoint Morgan Library. A few days ago Google released Google Book Search Mobile, an implementation of their rapidly growing digital library. What do these two events tell us about the way that Amazon and Google see us reading digital books?

  1. Amazon have 230,000 books available for purchase on the Kindle. Google have 1.5 million.
  2. The Kindle is still only available to US purchasers. The Google Book Search resource can be used eveywhere, but only 620,000 of those titles can be read in full and accessed page by page ex-USA.
  3. To read the digital books in the Amazon collection you need to first buy a dedicated reader which will cost you $399. The Google titles can be read on any system which supports a web browser.
  4. The Amazon collection includes many front list titles, many new best-sellers and they are often priced at $9.99 (ie less than the price of the equivalent hardback). All the Google books are old titles, many of them of very little interest to today’s audience. But there are plenty of great works of literature and masses of curiousities (its a bit like Great Granny’s library). They are all free. They are all free….
  5. You can import free books to the Kindle system (eg from a PDF file), but you can certainly import those titles to whatever device you are using to access the Google collection.
  6. Amazon’s system allows the text of any book to ‘flow’ into a scale and type size that suits your reading style. Google BSM also now allows for a downloadable ‘reflowable’ version of the text.
  7. Both systems will allow you to download a book to read it on a plane (or on the subway, where you can not access broadband). But for some of the older books, the Google ASCII downloads are sub-standard.
  8. Amazon gives much more information about the books in the system. Google has very limited meta-data and it is not at all easy to tell what books they have in their ‘library’ (which given the absence of a catalogue may be too polite a term for it).
  9. Amazon and Google’s systems both ‘look’ rather monopolistic, but at this point the Google system is free, so it may be churlish to worry about the Google monopoly just yet. Amazon’s potential monopoly may worry publishers, but Amazon today has plenty of competition from other systems which operate a similar downloadable ebook service (Sony, Iliad, Plastic Logic coming etc).
  10. Every book and every page in the Google service can be directly cited and referenced, which makes their books much better for bloggers and social networks. Amazon’s file format is non-standard, proprietary and wrapped with its own form of DRM (digital rights management).
  11. Kindle’s still lack colour. 16 shades of grey is the best that can be offered for illustrations. Few of the books accessible to Google BSM have any colour, but when contemporary titles are offered on the Google platform. Colour will be no problem for Google.
  12. Neither platform supports the concept of ‘first sale’ (ie second hand books). Google because it is fundamentally an access model, Amazon/Kindle have DRM.
  13. You can’t read Google Books on a Kindle and you can’t read the Kindle books on a PC or a mobile phone. But there have been rumours of Amazon making an announcement of a mobile version of the Kindle system (and Bezos mentioned synchronisation at the press conference) so this might be coming. If and when that happens mobile users will be able to compare Kindle books and GBSM books on the same device.

I have not been keeping the score on these comparisons. Many of them are qualitative trade-offs on which different constituencies will break in opposed directions, but my sense is that in the big picture Google is winning this hands down. The library of our future is going to be much more imbued with Google than with Amazon innovations. Starting from a large and substantially free and open foundation, Google will be attracting many developers and innovators to their track. It will be fascinating to see whether some Android (Google Mobile Phone) devices appear that are specifically geared to the books/newspaper market. Google is aiming to make its Book Search a resource for the whole web and will build a system with which many other web services can interoperate. Amazon is building a system which will allow individuals to purchase and collect titles in a format which suits them and their private needs. Google is aiming at aggregation and integration, Amazon is limited in its vision to using the web as a distribution medium

Google Book Search and the Tragedy of the Anti-Commons

Michael Heller, a property lawyer at Columbia University, has coined the term the ‘tragedy of the anti-commons’. This is a twist on the more familiar idea of ‘the tragedy of the commons’ — which is thought to be the cause of such ecological disasters as the implosion of fisheries, perhaps even the nearing apocalypse of global heating. Heller’s insight is that too much private ownership can be as much of a problem as too little: “When too many owners control a single resource, cooperation breaks down, wealth disappears and everybody loses.” He gives plenty of examples in his book The Gridlock Economy — the book’s argument is forcibly stated in its subtitle: How Too Much Ownership Wrecks Markets, Stops Innovation, and Costs Lives.

There is a good chance that the Google Books Settlement is going to show us all how this tragedy of the anti-commons works out in the world of books. The Google project, which is backed by the American publishers and American Authors’s representatives should be (in my view will be) a wonderful resource for American universities, schools, public libraries and through them for American consumers. By 2011, if the Settlement is approved, at least 5 million out of print but not yet out of copyright [OOPnotYOOC] titles will be available to readers in the US market. This resource will have little opportunity to work so well for authors, readers and consumers in the rest of the world. The books will by and large not be available in the rest of the world (perhaps in American embassies?).

Google is already serving a very different and vastly narrower view of Google Book Search to the rest of the world (even to Canada and Mexico). Books which are public domain and wholly visible and readable in the US are not visible and readable elsewhere. And this copyright caution about territorial rights is unlikely to change, because the Settlement, when it is approved, is only going to be approved and agreed for the US market. Google has been persuaded (or has volunteered?) to accept the territorial restrictions and complications inherent in the market of copyright books. In my view, Google will not risk starting court actions in other jurisdictions, for the very simple reason that they might be lost, or worse still settled on a different basis from the US dispute. Google will be bound to leave the ex-US position of its wonderful aggregate of unloved (mostly ‘orphan’) copyrights in a national limbo. The orphans will remain unloved outside the 50 states.

The complexity of the rights situations of these millions of titles is effectively unmanageable and un-negotiable, which is pretty much what Michael Heller means by a tragedy of the anti-commons. By developing and growing an intricate and incredibly complex system of rights for different legal regimes and market territories the publishing industry has produced a system where a negotiated and innovative new service is probably impossible. It would take something like a new Berne convention on copyright to make this a level plane for all jurisdictions.

One might say that this hopeless and impenetrable thicket of rights which are largely historical and dormant is a problem for the rest of the world and for scholars outside the US. It is not a problem for the US, or for Google. Well maybe….. but it is also possible that this lack of international and global relevance will undermine the authority and the prestige of a US-centric resource. I wonder whether US scholars will accept a situation in which citations and references cannot be made and verified in a global context?

There is another dimension in which the impenetrable complexity of the rights position OOPnotYOOC titles: illustrations and photographs in these titles are in effect excluded from the scope of active exploitation by Google. Interestingly enough, Children’s Book Illustrations are to be treated differently. They are defined as ‘inserts’ and therefore fall within the scope of the settlement and will presumably be in the searchable and readable services that Google produces. But, in the place of ordinary illustrations and photographs in books which are not ‘Children’s Books’ we should expect gaps or blanks, such as one already finds in the Google Book Search service. Eric Rumsey thinks that I may be on my own in reading the Google Settlement this way, but some apparently well-informed, anonymous, commenter makes a similar point in a comment on the Martyn Daniels blog. Why should illustrations in Children’s Books be treated differently from those in other books? I suspect that the publishers and the Authors Guild felt that they could negotiate with certainty on these rights (as also on quotation rights, rights in poetry etc) but they knew that they could not negotatiate for the owners of artistic rights.

Will it matter that Google Book Search, when it is marketed as a commercial subscription service for libraries and universities cannot be accessed or read in the world at large? Will it matter that many of the photographs and illustrations in millions of the OOPnotYOOC titles will not be there? Yes, it will matter, and that it matters will be another instance of the tragedy of the anti-commons.

Naming of Parts

We now have a flexible system of web access management that allows a publisher to select areas of a book which can be assessed and sampled in full view before a purchase. For example here are some full size pages from the Time Out City Guide to London.

From the book’s home page, there are some named links which allow the user to grasp the context of sample pages that might be of interest. Bloomsbury and Fitzrovia is a better handle in a city guide to London than pages 106-113.

So now we need a good methodology for encouraging publishers to name and open relevant parts of their books for sample access. The obvious solution, the one we have adopted, is probably the right one: Chapter 1, or Chapter 1’s title, Bibliography or Table of Illustrations etc…. Is there a way of extending this nomenclature to readers and users? Is a vocabulary for chunked reference in books something that they will want? When every print page is a web page its a simple enough matter to provide names to groups of pages. Will each web-published book aquire its own patina of user-generated tags in the way that Flickr and now have clouds of very helpful handles? Its an intriguing possibility, especially sine the handles would be used by other programs and resources.

Henry Reed’s Naming of Parts

To-day we have naming of parts. Yesterday,
We had daily cleaning. And to-morrow morning,
We shall have what to do after firing. But to-day,
To-day we have naming of parts……