Although the global language for the publication is set in the EPUB package file metadata, it’s still a good practice to specify the language in each of your content documents. In an age of cloud readers, assistive technologies might not have access to the default language if you don’t (unless they rewrite your content file to include the information in the package document, which is a bad assumption to make). Without the default language, you can impact on the ability of the assistive technology to properly render text-to-speech playback and on how refreshable braille displays render characters.
An xml:lang attribute on the root html element is all it takes to globally specify
the language in XHTML content documents. For compatibility purposes,
however, you should also include the HTML lang
attribute. Both attributes must specify the same value when they’re
used.
We could indicate that a document is in German as follows:
<html … xml:lang="de" lang="de">
Similarly, for SVG documents, we add the xml:lang attribute to indicate that the title, description, and
other text elements are in French:
<svg … xml:lang="fr">
You should also clearly identify any prose within your book that is in a different language from the publication:
<p>She had an infectious <i xml:lang="fr" lang="fr">joie de vivre</i> mixed with a
certain <i xml:lang="fr" lang="fr">je ne sais quoi</i>.</p>
The xml:lang attribute can be attached to any
element in your XHTML content documents (and the lang attribute is again included for compatibility). Properly
indicating when language of words, phrases, and passages changes allows
text-to-speech engines to voice the words in the correct language and apply
the proper lexicon files, as we’ll return to in more detail in the
text-to-speech section.
Although you’ll hear that all EPUB 3s have a default reading order, it’s not
necessarily the same thing as the logical reading order, or primary
narrative. The EPUB 3 spine element in the
publication manifest defines the order in which a reading system should
render content files as you move through the publication. This default order
enables a seamless reading experience, even though the publication may be
made up of many individual content files (e.g., one per chapter).
But although the main purpose of the spine is to
identify the sequence in which documents are rendered, you can use it to
distinguish primary from auxiliary content files. The linear attribute can be attached to the child itemref elements to indicate whether the
referenced content file contains primary reading content or not. If a
content file contains auxiliary material that would normally appear at the
point of reference, but is not considered part of the main narrative, it
should be indicated as such so that readers can choose whether to skip
it.
For example, if you group all your chapter end notes in a separate content document, you could indicate their auxiliary status as follows:
<spine>
…
<itemref idref="chapter1"/>
<itemref idref="chapter1-notes" linear="no"/>
<itemref idref="chapter2"/>
<itemref idref="chapter2-notes" linear="no"/>
…
</spine>
A reader could now ignore these sections and continue following the primary narrative uninterrupted. But this capability is only a simple measure for distinguishing content that is primary at the macro level; it’s not effective in terms of distinguishing the primary narrative flow of the content within any document. (Although in the case of simple works of fiction that contain only a single unbroken narrative, it might be.)
Sighted readers don’t typically think about the logical reading order within the chapters and sections of a book, but that’s because they can visually identify the secondary content and read around it as desired. A reading system, however, doesn’t have this information to use for the same effect unless you add it (those semantics, again).
As I touched on in keeping style separate from content, you can, for example,
give a sidebar a nice colorful border and offset it from the narrative
visually using a div and CSS, but you’ve
limited the information you’re providing to only a select group when all you
use is style. Using a div instead of an aside element means a reading system will not
know by default that it can skip the sidebar if the reader has chosen to
only follow the primary narrative.
For someone listening to the book using a text-to-speech engine, the
narrative will be interrupted and playback of the sidebar div will be initiated when you mis-tag content in
this way. The only solution at the reader’s disposal might be to slowly move
forward until they find the next paragraph that sounds like a continuation
of what they were just listening to (div
elements aren’t always escapable). Picture trying to read and keep a thought
with the constant interruptions that can result from sidebars, notes,
warnings and all the various other peripheral text material a book might
contain.
For this reason, you need to make sure to properly identify content that is
not part of the primary narrative as such. The aside element is particularly useful when it comes to marking
text that is not of primary importance, but even seemingly small steps like
putting all images and figures in figure tags
allows the reader to decide what additional information they want presented.
I’ll be returning to how to tag many of these as we go, too.
The EPUB 3 Structural Semantics Vocabulary is also a useful reference when it comes to which semantics and elements to apply to a given structure. Each of the semantics defined in this vocabulary indicates what HTML element(s) it is intended to be used in conjunction with.