Language

Although the global language for the publication is set in the EPUB package file metadata, it’s still a good practice to specify the language in each of your content documents. In an age of cloud readers, assistive technologies might not have access to the default language if you don’t (unless they rewrite your content file to include the information in the package document, which is a bad assumption to make). Without the default language, you can impact on the ability of the assistive technology to properly render text-to-speech playback and on how refreshable braille displays render characters.

An xml:lang attribute on the root html element is all it takes to globally specify the language in XHTML content documents. For compatibility purposes, however, you should also include the HTML lang attribute. Both attributes must specify the same value when they’re used.

We could indicate that a document is in German as follows:

<html … xml:lang="de" lang="de">

Similarly, for SVG documents, we add the xml:lang attribute to indicate that the title, description, and other text elements are in French:

<svg … xml:lang="fr">

You should also clearly identify any prose within your book that is in a different language from the publication:

<p>She had an infectious <i xml:lang="fr" lang="fr">joie de vivre</i> mixed with a
    certain <i xml:lang="fr" lang="fr">je ne sais quoi</i>.</p>

The xml:lang attribute can be attached to any element in your XHTML content documents (and the lang attribute is again included for compatibility). Properly indicating when language of words, phrases, and passages changes allows text-to-speech engines to voice the words in the correct language and apply the proper lexicon files, as we’ll return to in more detail in the text-to-speech section.

Logical Reading Order

Although you’ll hear that all EPUB 3s have a default reading order, it’s not necessarily the same thing as the logical reading order, or primary narrative. The EPUB 3 spine element in the publication manifest defines the order in which a reading system should render content files as you move through the publication. This default order enables a seamless reading experience, even though the publication may be made up of many individual content files (e.g., one per chapter).

But although the main purpose of the spine is to identify the sequence in which documents are rendered, you can use it to distinguish primary from auxiliary content files. The linear attribute can be attached to the child itemref elements to indicate whether the referenced content file contains primary reading content or not. If a content file contains auxiliary material that would normally appear at the point of reference, but is not considered part of the main narrative, it should be indicated as such so that readers can choose whether to skip it.

For example, if you group all your chapter end notes in a separate content document, you could indicate their auxiliary status as follows:

<spine>
    …
    <itemref idref="chapter1"/>
    <itemref idref="chapter1-notes" linear="no"/>
    <itemref idref="chapter2"/>
    <itemref idref="chapter2-notes" linear="no"/>
    …
</spine>

A reader could now ignore these sections and continue following the primary narrative uninterrupted. But this capability is only a simple measure for distinguishing content that is primary at the macro level; it’s not effective in terms of distinguishing the primary narrative flow of the content within any document. (Although in the case of simple works of fiction that contain only a single unbroken narrative, it might be.)

Sighted readers don’t typically think about the logical reading order within the chapters and sections of a book, but that’s because they can visually identify the secondary content and read around it as desired. A reading system, however, doesn’t have this information to use for the same effect unless you add it (those semantics, again).

As I touched on in keeping style separate from content, you can, for example, give a sidebar a nice colorful border and offset it from the narrative visually using a div and CSS, but you’ve limited the information you’re providing to only a select group when all you use is style. Using a div instead of an aside element means a reading system will not know by default that it can skip the sidebar if the reader has chosen to only follow the primary narrative.

For someone listening to the book using a text-to-speech engine, the narrative will be interrupted and playback of the sidebar div will be initiated when you mis-tag content in this way. The only solution at the reader’s disposal might be to slowly move forward until they find the next paragraph that sounds like a continuation of what they were just listening to (div elements aren’t always escapable). Picture trying to read and keep a thought with the constant interruptions that can result from sidebars, notes, warnings and all the various other peripheral text material a book might contain.

For this reason, you need to make sure to properly identify content that is not part of the primary narrative as such. The aside element is particularly useful when it comes to marking text that is not of primary importance, but even seemingly small steps like putting all images and figures in figure tags allows the reader to decide what additional information they want presented. I’ll be returning to how to tag many of these as we go, too.

The EPUB 3 Structural Semantics Vocabulary is also a useful reference when it comes to which semantics and elements to apply to a given structure. Each of the semantics defined in this vocabulary indicates what HTML element(s) it is intended to be used in conjunction with.