To Reflow or to Cite?

The Association of American Publishers have produced a letter in support of the IDPF‘s EPUB standard. There are so many things wrong with this approach that it is hard to know where to start. This quotation is representative of the substance of the letter:

….For books with text that can be reflowed, many publishers would like to create and deliver to retailers and/or wholesalers EPUB files. If a proprietary e-book format is then needed, it is expected that the retailer and/or wholesaler will take on the effort to convert the EPUB file in a scalable, high fidelity way that either preserves the layout and design of the original or otherwise delivers the content in a rendering acceptable to the publisher.

First, let us grant that EPUB has a role to play as a safe and neutral file format for the various proprietary eBook standards to aim at as a conversion bridge. But it is really a very small role, and in my view PDF will be a much more important archival and preservative file format than the EPUB specification. Second, of course we are in favour of standards and different sectors of the industry collaborating to support them. But nearly everything else about the AAP’s letter is off-base or highly debatable.

  • It is not clear that digital books really have to have a file format. Thinking of books as digital text files is not the way that Google Book Search works (nor is it the way we think at Exact Editions). Books are the building blocks of Google Book Search, but they are not necessarily or primarily files. The GBS view has them as collections of web pages (managed by a scaleable database that hosts many books).
  • Then there is this new ‘reflow’ concept. There is a growing general presumption that reflowable books are desirable. Transitive activity verbs tend to have a positive connotation. It is better that you have a book that you can {copy, lend, read, sell, flow, reflow} right? Well maybe, but a reflowable book is a book that you can not cite, that you probably cannot bookmark, that a search engine will not be able to directly search…… From many standpoints reflowable books/texts are a second best idea. Do we hear librarians, historians and curators calling for reflowable books: with tables and indexes which lose their bearings, pages that cannot be cited and typography that is messed up?
  • If you decide on a distribution channel that permits ‘reflow’, your book in that format will not have determinate page references and citations. That is such a big loss that every book which is ‘reflowable’ may need to have a referentially stable primary edition.
  • What on earth can the AAP mean by expressing the hope that the industry will have ‘completed’ the transition to the EPUB standard by October 2008? Completed what?

It is a mostly waffly and empty letter and will not carry weight in the tussle between Google (which should have minimal need for the EPUB format) and Amazon which is broadly on the books-are-a-file side of the fence and ought to be using EPUB for its Kindle, but is not. Whether digital books are citeable and searchable, page-fixed, digital resources; or electronic texts within a Kindle/Sony/Iliad reader will be clearer in a year or two. I doubt that it will be settled by October of this year.

Final irony. Reflowable and easily copyable texts have their purposes. One of them would be to make it easier for people to copy statements put out on the web. The AAP letter is such a circumstance, but so far from being in a reflowable or easily copied format, their letter has been put up as a simple JPEG and I had perforce to retype the passage quoted above (any errors of transcription are mine).


  1. Publishers’PDFs obviously get a momenteous support from CPI’s offer to manage their digital assets, together with Exegenix…About ‘re-flow’, it depends at which stage or granularity it is managed:Fuji Xerox PARC demo of its techniques to display images of documents on small screens and manage links, indexes, zooms and ‘readability’ thanks to a reflow, are worth considering. At least for marginal, emergency situations where a reader wants to read a book in less than optimal conditions.

  2. I don’t see why a ‘reflowable’ book necessarily has to lose meaningful indexes, tables, TOCs, etc. HTML is reflowable, but you can refer to parts of it by using a fragment identifier “#section” in the URL. This refers to an anchor in the text. Indexes, TOCs, and citations could link directly to the appropriate paragraph by using a similar anchor, or navigate by specifying a section header and paragraph number within that section (or other reflow-invariant indicator).

