One of the problems which the eBook market faces is that we are not completely secure about what to call the thing that we are bringing into being. What are we talking about when we talk about eBooks? There is a fair old jumble of substantives clamouring to receive our attention: digital books, ebooks, electronic or computerised texts, textual databases, interactive texts, online books (or texts), plain old content, digital libraries, digital editions, web editions, etc. So here are some proposed definitions and qualifications:
An eBook is something which, in the general case, requires a dedicated eBook reader for it to be read. That is to say, an eBook is something that you will find in a Kindle, or a Sony eBook reader.
Distinct from the eBook is the file format in which the text is held. This might be something like the proprietary format of the Kindle (AZW) or it might be non-proprietary like the EPUB, formerly Open eBook format. An eBook will have a file format, and although it will generally be read through (on?) an eBook reader, this could be a virtual or layer of eBook reader not necessarily the physical hardware system for which the eBook was originally devised. So we have the Stanza reader on the iPhone or the PC which will read a lot of different books in the same software environment. Also we have the recently announced Kindle App which in a manner of speaking puts a virtual Kindle on the iPhone (and a similar trick could be performed for other hardware platforms). File formats, we have to point out, have different instances, and publishers and booksellers may sell individual copies of the eBook and track their destination through digital inventories. File formats can also be corrupted and encrypted, some publishers corrupt files by encrypting them in Digital Rights Management software. This practice (DRM) is a really terrible idea which insults and damages the market, but publishers are tempted to use it because digital files have marginal cost to copy and if books are encoded in file formats they look terribly vulnerable to illegal acts.
The file format for eBooks works to define and individuate particular books or titles, or works, or serials or issues (yes plenty of confusion there unless you keep all those matters straight). The Google Settlement has also introduced the fascinating new concept of an insert, and an insert generally does not contain its own illustrations but it, probably, will contain its own children’s book illustrations, diagrams, charts and graphs (but not maps). I hope that is all perfectly clear, if it is not and further perusal of the Draft Settlement leaves you in any doubt I suggest that you engage the services of a copyright lawyer (preferably one with a background in the part-work printing industry).
Having introduced the topic of Google Books, we should explain that whilst it is generally sensible to ask about the file format for an eBook, it is generally not relevant to ask questions about the file format of digital editions. Here we may be engaging in some creative definition making, but we define and differentiate Google Book Search books and Exact Editions magazines and books as digital editions rather than eBooks. Digital editions are made from books as files, but it is not helpful to think of them as having a file format. The PDF which the publisher produced or the scanner generated is used to create a database, but the format of the book as input has little to do with the format of the book as it is used or browsed by the consumer. The Google and Exact Editions systems deliver them to users as a service, searchable, citeable and potentially shareable, but they are web services in which each printed page has a corresponding web page, a web page with a JPEG at the centre and chatter of more or less relevant HTML around it. One might think of the individual book as the total of its urls, and thus postulate the individual title as having a web file format defined by its web instances. But this is not too helpful because the books are held in a database and it is the functionality of this database and the API’s to it, which effectively generate and limit its performance and usability. So unlike eBooks, digital editions, we say, do not have file formats and its not relevant to use the language of unit sales or DRM. Digital editions come from databases that host them, probably in the cloud, and this is a completely different ball of wax — nor should you be deflected from this perception by the confusing fact that Google Book Search (and Exact Editions) offer you the opportunity to download a PDF, whether of individual pages or of the whole work.
That may be enough confusing terminology for one day, but we will doubtless return to the subject in future postings.