How to make an e-book the hard way

Producing an ebook for the Kindle is straightforward. Amazon provide various conversion utilities and they have detailed documentation about the best way to format a word document for automatic conversion. There is little to stop anyone submitting a book and putting it for sale on Amazon.

Amazon's conversion software should suffice for most situations. I use it for sending manuscripts and papers to my kindle. The formatting is sometimes mangled, but it's good enough to read. With care it ought to be possible to layout a word document that looks fine. However, it's interesting to know exactly how things work. 

The only tool you need is Amazon's kindlegen (the Kindle Previewer looks incredibly useful but doesn't work on Linux so I've not used that yet). What follows is not really a tutorial, more a pointer to existing resources.

kindlegen is a command-line tool that takes an EPUB file and converts it into the .mobi format used by the Kindle. Since an EPUB file is a collection of text files, these can be written by hand. Most of the heavy lifting has been done by Craig Mod, whose Ahab project provides a template for producing EPUB files for use on the Kindle.

The main file is content.opf, the Open Packaging Format file, which contains the book's metadata. It's a simple matter to point kindlegen at this file:

kindlegen content.opf -o book.mobi

When running against the Ahab templates, kindlegen issues a warning about a max-length CSS property. This is not actually a problem and details are given in the comments of the CSS file.

Ahab provides a good guide to how the EPUB fits together, and which sections need editing. I added my own content to the HTML directory and updated each of the XML files to point to my own content. Nothing about this was particularly tricky – however, I have yet to produce a cover that works on different versions of the Kindle, so I might need to expand on that in a future post.

The resulting .mobi file can be emailed to a Kindle for review or transferred directly through USB. Content and layout issues aside, it's not too hard to produce an ebook.

The only thing that wasn't immediately obvious to me was why there was both an HTML table of contents and one defined in the NCX file. The NCX file contains a series of navPoints, which appear as the tick marks in the progress bar at the bottom of the Kindle screen. The navPoints are also used to jump to the next section in a book. The HTML file is the table of contents that appears within the book's main text, Obviously the NCX file and HTML table of contents can contain completely different information. From what I've read, the NCX file is not used by newer models like the Kindle touch and Kindle fire.

Page breaks are interesting. EPUB files have an implicit page break after each of the source HTML files linked in the OPF file. Apparently it is possible to add an explicit page break through some CSS trickery, but this is something of a hack. There is no reason not to have a separate HTML file for each piece of content, although this might mean having rather a lot of source files to maintain. EDIT: see below

There a few things I'm still not sure about. One is DRM and the other is how page numbers are added, which I can't get to work. Most of the issues I've encountered are about proofing. There are also some interesting issues with my current project, converting some blog posts into an ebook. These touch on some fairly funadmental issues with ebooks and deserve their own post.

Hand-crafting a .mobi file is fairly small thing, but knowing how an EPUB is built makes it possible to some interesting things. More to follow.

EDIT (15/10/12): For a kindle format (mobipocket) file, the correct way to add a pagebreak is by using the <mbp:pageBreak/>. The mbp namespace does not need to be defined within the XML documentation. I am now going to read the mobipocket documentation properly.