Chapter 2. Building a Better EPUB: Fundamental Accessibility

This guide takes a slightly different approach to accessibility because of the feature-rich nature of EPUB 3. Instead of grouping all the practices together under a single rubric of essentiality, I’m going to instead take a two-tier approach to making your content accessible.

This first section deals with the core text and image EPUB basics, while the second ventures into the wilder areas, like scripting and the new accessible superstructures you can build on top.

I’m going to start with a section on the fundamentals of accessible content, naturally enough, because if you get your foundation wrong, everything else degrades along with it.

A Solid Foundation: Structure and Semantics

The way to begin a discussion on the importance of structure and semantics is not by jumping into a series of seemingly detached best practices for markup, but to stop for a moment to understand what these terms actually mean and why they’re so important to making data accessible. We’ll get to the guidelines soon enough, but if you don’t know why structure and semantics matter, you’re already on the fast track to falling into the kinds of bad habits that make digital data inaccessible, no matter the format.

Although the terms are fairly ubiquitous when it comes to discussing markup languages and data modeling generally—because they are so important to the quality of your data and your ability to do fantastic-seeming things with it—they are often bandied about in ways that make them sound geeky and inaccessible to all but data architects. I’m going to try and make them more accessible in showing how they facilitate reading for everyone, however.

Let’s start simple, though. You’re probably used to hearing the terms defined along these lines: structure is the elements you use to craft your EPUB content, and semantics is the additional meaning you can layer on top of those structures to better indicate what they represent.

But that’s undoubtedly a bit esoteric if you don’t go mucking around in your markup on a regular basis, so let’s take a more descriptive approach to their meaning. Another way to think about their importance and relationship is via a little reformulation of Plato’s allegory of the cave. In this dialogue, if you’ve forgotten your undergrad Greek philosophy, Socrates describes how the prisoners in the cave can only see shadows of the true forms of things on the walls as they pass in front of a fire, and only the philosopher kings will eventually break free of the chains that bind them in ignorance and come to see the reality of those forms.

As we reformulate Plato, the concept of generalized and specific forms is all that you need to take away from the original allegory, as getting from generalized to specific is the key to semantic markup. In the new content world view I’m proposing, the elements you use to mark up a document represent the generalized reflection of the reality you are trying to express. At the shadow level, so to speak, a chapter and a part and an introduction and an epilogue and many other structures in a book all function in the same way, like encapsulated containers of structurally significant content.

These general forms allow markup grammars, like HTML5, to be created without element counts in the thousands to address every possible need. A generalized element retains the form of greatest applicability at the expense of specifics, in other words. The HTML5 grammar, for example, solves the problem of a multitude of structural containers with only slightly differing purposes by introducing the section element.

But what help is generalized markup to a person, let alone a reading system, let alone to an assistive technology trying to use the markup to facilitate reading? Try making sense of a markup file by reading just the element names and see how far you get; a reading system isn’t going to fare any better despite a developer’s best efforts. HTML5 may now allow you to group related content in a section element, for example, but without reading the prose for clues all you know is that you’ve encountered a seemingly random group of content called section. This is structure without semantics.

You might think to make out the importance of the content by sneaking a peek ahead at the section’s heading—assuming it has one—but unless the heading contains some keyword like “part” or “chapter” you still won’t know why the section was added or how the content is important to the ebook as a whole. And cheating really isn’t fair, as making applications perform heuristic tests like looking at content can be no small challenge. This is both the power and failing of trying to process generalized markup languages and do meaningful things with what you find: you don’t have to account for a lot, but you also don’t often get a lot to work with.

Getting back to our analogy, though, it’s fair to say we’re all philosopher kings when it comes to the true nature of books; we aren’t typically interested in, and don’t typically notice, generalized forms when reading. But, whether we realize it or not, we rely on our reading systems being able to make sense of these structures to facilitate our reading, and much more so when deprived of sensory interactions with the device and content. When ebooks contain only generalized structures, reading systems are limited to presenting only the basic visual form of the book. Dumb data makes for dumb reading experiences, as reading systems cannot play the necessary role of facilitator when given little-to-nothing to work with. And that’s why not everyone can read all digital content.

It’s not always obvious to sighted readers at this point why semantics are important for them, though, as they just expect to see the visual presentation the forms provide and to navigate around with fingers and eyes. But that’s also because no one yet expects more from their digital reading experience than what they were accustomed to in print. Knowing whether a section is a chapter or a part as you skip forward through your ebook can make it so you don’t always have to rely on opening the table of contents. Knowing where the body matter section begins can allow a reading system to jump you immediately to the beginning of the story instead of the first page of front matter. Knowing where the body ends and back matter begins could allow the reading system to provide the option to close the ebook and go back to your bookshelf; it might also allow links to related titles you might be interested in reading next to be displayed. Without semantically rich data, only the most rudimentary actions are possible. With it, the possibilities for all readers are endless.

So, to wrap up the analogy, while some of us can read in the shadow world of generalized markup, all we get when we aim that low is an experience that pales to what it could be, and one that needlessly introduces barriers to access. If I’ve succeeded in bringing these terms into relief, you can hopefully now appreciate better why semantics and structure have to be applied in harmony to get the most value from your data. The accessibility of your ebook is very much a reflection of the effort you put into it. The reading system may be where the magic unfolds for the reader, but all data magic starts with the quality of the source.

With that bit of high-level knowledge under our belts, let’s now turn to how the two work together in practice in EPUB 3 to make content richer and more accessible.

Data Integrity

The most important rule to remember when structuring your content is to use the right element for the right job. It seems like an obvious statement, but too often people settle for the quick solution without thinking about its impact; look no further than the Web for examples of markup run amok. Print to digital exports are also notorious for taking the path of least complexity (p-soup, as I like to call the output that wraps most everything in paragraph tags). In fairness, though, print layout programs typically lack the information necessary for the export to be anything more than rudimentary.

When present, however, reading systems and assistive technologies are able to take advantage of specialized tags to do the right thing for you, but there’s little they can do if you don’t give them any sense of what they’re encountering.

When it comes to EPUB 3, if you don’t know what’s changed in the new HTML specification, go and read the element definitions through; it’s worth the time. EPUB 3 uses the XHTML flavor of HTML5 for expressing text content documents, so knowledge of the specification is critical to creating good data. Don’t assume knowledge from HTML4, as the purpose of many elements has changed, and elements you thought you knew might have different semantic meanings now (especially the old inline formatting elements like i, b, small, etc.).

And remember that structure is not about what you want an element to mean. The changes to the HTML5 element definitions may not always make the most sense (see the human restriction on the cite element as one commonly cited example), but twisting definitions and uses to fit your own desires isn’t going to make you a friend of accessibility, either. Reading systems and assistive technologies are developed around the common conventions.

And whatever you do, don’t perpetuate the sin of immediately wrapping div and span tags around any content you don’t know how to handle. It’s a violation of the EPUB 3 specification to create content that uses generic elements in place of more specific ones, and it doesn’t take long to check if there really is no other alternative first. When you make up your own structures using generic tags, you push the logical navigation and comprehension of those custom structures onto the reader (and potentially mess up the HTML5 outline used for navigation). Sighted readers may not notice anything, but when reading flows through the markup, convoluted structures can frustrate the reader and interfere with their ability to effectively follow the narrative flow.

If you don’t discover an existing element that fits your need, the process of checking will typically reveal that you’re not alone in your problem, and that community-driven solutions have been developed. Standards and conventions are the friend of accessibility. And if you really don’t know and can’t find an answer, ask. The IDPF maintains discussion forums where you can seek assistance.

There are, of course, going to be many times when you have no choice but to use a generic tag, but when you do, always try to attach an epub:type attribute with a specific semantic (we’ll cover this attribute in more detail shortly). The more information you can provide, the more useful your data will be.

Take the converse situation into consideration when creating your content, too. You aren’t doing readers a service by finding more, and ever complex, ways to nest simple structures. The more layers you add the harder it can be to navigate, as I already mentioned. Over-analyzing your data can be as detrimental to navigation as under-analyzing.

For persons who cannot visually navigate your ebook, this basic effort to properly tag your data reduces many of the obstacles of the digital medium. The ability to skip structures and escape from them starts with meaningfully tagged data. The ability to move through a document without going to a table of contents starts with meaningfully tagged data.

Note

Skipping and escaping are terms that will come up repeatedly in this guide. Skipping, as you might expect, is the ability to ignore elements completely, to skip by them. Accessible reading systems typically provide the ability for the reader to specify the constructs they wish to ignore, such as sidebars, notes, and page numbers. Escapable content typically consists of deep-nested or repetitive structures—such as found in tables and lists—that a user may wish to move out from in order to continue reading at the next available item following the escaped content (a reading system’s user interface would normally provide quick access to the “escape” command, so that the operation can easily be called repetitively, if needed).

The integrity of your data is also a basic value proposition. Do you expect to throw away your content and start over every time you need to re-issue, or do you want to retain it and be able to easily upgrade it over time? Structurally meaningful data is critical to the long-term archivability of your ebooks, the ability to easily enhance and release new versions as technology progresses, as well as your ability to interchange your data and use it to create other outputs. Start making bad data now and expect to be paying for your mistakes later.

Separation of Style

Some old lessons have to be continuously relearned and reinforced, and not mixing content and style is a familiar friend to revisit whenever talking about accessible data.

To be clear, separating style does not mean avoiding the style attribute and putting all your CSS in a separate file, even if that is another good practice we’ll get back to. What separation of style refers to is not expecting the visual appearance of your content to convey meaning to readers. Style is just a layer between your markup and the device that renders it, not an intrinsic quality you can rely on to say anything about your content. Typographic conventions had to convey meaning in print because that was all that was available, and are still useful for sighted readers, but are the wrong place now to be carrying meaning.

Some reading systems will give you the full power of CSS, while others won’t even have a screen for reading. Some readers will visually read your content, while others will be using nonvisual methods. If only the visual rendering of your content conveys meaning to the reader, you’re failing a major accessibility test. Leave style in that in-between layer where it targets visual readers, and keep your focus on the quality of your markup so that everyone wins.

The most basic rule of thumb to remember is that if you remove the CSS from your ebook, you should still be able to extract the same meaning from it as though nothing had changed. Your markup is what should ultimately be conveying your meaning. If you rely solely on position or color or whatever other stylistic flair you might devise, you’re taking away the ability of a segment of your readers to understand the content.

But there is something to be said for cleanly separating content from style at the file level, too. The cascading nature of styles means that the declaration closest to the element to be rendered wins. If you tack style attributes all over your content you can interfere with the ability of a reader to apply an alternate style sheet to improve the contrast, for example, or to change the color scheme, as the local definition may override the problem the reader is attempting to fix. Consequently, suggesting that you avoid the style attribute like the plague is actually not an overstatement.

More realistically, though, you should be able to use CSS classes for your needs. If, for some reason, you do have to add a style attribute, though, avoid using it to apply general stylistic formatting. Keeping your style definitions in a separate file simplifies their maintenance and facilitates their re-use on the production side, anyway, and this simple standard practice nets you an accessibility benefit.

Semantic Inflection

I’m not going to rehash the reasons for semantic markup again, but I intentionally neglected getting into the specifics of how they’re added in EPUB 3 until now so as not to confuse the need with the technical details.

Adding semantic information to elements is actually quite simple to do; EPUB 3 includes the epub:type attribute for this purpose. You can attach this attribute to any HTML5 element so long as you declare the epub namespace. The following example uses the attribute to indicate that a dl element represents a glossary:

<html … xmlns:epub="http://www.idpf.org/2007/ops">
    …
     <dl epub:type="glossary">
         <dt><dfn>Brimstone</dfn></dt>
         <dd>Sulphur; See <a href="#def-sulphur">Sulphur</a>.</dd>
     </dl>
    …
</html>

Whenever you use unprefixed values in the attribute (i.e., without a colon in the name), they must be defined in the EPUB 3 Structural Semantics Vocabulary. All other values require a defined prefix and are typically expected to be drawn from industry-standard vocabularies. In other words, you cannot add random values to this attribute, like you can with the class attribute.

You can create your own prefix, however, and use it to devise any semantics you want, but don’t create these kinds of custom semantics with the expectation they will have an effect on the accessibility or usability of your ebook. Reading systems ignore all semantics they don’t understand and don’t have built-in processing for. It would be better to work with the IDPF or other interested groups to create a vocabulary that meets your needs if you can’t locate the semantics you need, as you’re more likely to get reading system support that way.

The attribute is not limited to defining a single semantic, either. You can include a space-separated list of all the applicable semantics in the attribute.

A section, for example, often may have more than one semantic associated with it:

<section epub:type="toc backmatter">
    …
</section>

The order in which you add semantics to the attribute does not infer importance or affect accessibility, so the above could have just as meaningfully been reversed.

You should also be aware that this attribute is only available to augment structures; it is not intended for semantic enrichment of your content. Associating the personal information about an individual contained in a book so that a complete picture of their life can be built by metadata querying, for example, is not yet possible. The metadata landscape was considered too unstable to pick a method for enriching data, but look for a future revision to include this ability, whether via RDFa, microdata, or another method.

And in case it needs repeating, semantics are not just an exercise in labeling elements. As I discussed in the introduction to this section, these semantics are what enable intelligent reading experiences. If you had 25 definition lists in an ebook each with a particular use, how would a reading system determine which one represents the glossary if you didn’t have a semantic applied as in the first example? If you know which is the glossary, you could provide fast term lookups. The easier you make it for machines to analyze and process your data, the more valuable it becomes.

Language

Although the global language for the publication is set in the EPUB package file metadata, it’s still a good practice to specify the language in each of your content documents. In an age of cloud readers, assistive technologies might not have access to the default language if you don’t (unless they rewrite your content file to include the information in the package document, which is a bad assumption to make). Without the default language, you can impact on the ability of the assistive technology to properly render text-to-speech playback and on how refreshable braille displays render characters.

An xml:lang attribute on the root html element is all it takes to globally specify the language in XHTML content documents. For compatibility purposes, however, you should also include the HTML lang attribute. Both attributes must specify the same value when they’re used.

We could indicate that a document is in German as follows:

<html … xml:lang="de" lang="de">

Similarly, for SVG documents, we add the xml:lang attribute to indicate that the title, description, and other text elements are in French:

<svg … xml:lang="fr">

You should also clearly identify any prose within your book that is in a different language from the publication:

<p>She had an infectious <i xml:lang="fr" lang="fr">joie de vivre</i> mixed with a
    certain <i xml:lang="fr" lang="fr">je ne sais quoi</i>.</p>

The xml:lang attribute can be attached to any element in your XHTML content documents (and the lang attribute is again included for compatibility). Properly indicating when language of words, phrases, and passages changes allows text-to-speech engines to voice the words in the correct language and apply the proper lexicon files, as we’ll return to in more detail in the text-to-speech section.

Logical Reading Order

Although you’ll hear that all EPUB 3s have a default reading order, it’s not necessarily the same thing as the logical reading order, or primary narrative. The EPUB 3 spine element in the publication manifest defines the order in which a reading system should render content files as you move through the publication. This default order enables a seamless reading experience, even though the publication may be made up of many individual content files (e.g., one per chapter).

But although the main purpose of the spine is to identify the sequence in which documents are rendered, you can use it to distinguish primary from auxiliary content files. The linear attribute can be attached to the child itemref elements to indicate whether the referenced content file contains primary reading content or not. If a content file contains auxiliary material that would normally appear at the point of reference, but is not considered part of the main narrative, it should be indicated as such so that readers can choose whether to skip it.

For example, if you group all your chapter end notes in a separate content document, you could indicate their auxiliary status as follows:

<spine>
    …
    <itemref idref="chapter1"/>
    <itemref idref="chapter1-notes" linear="no"/>
    <itemref idref="chapter2"/>
    <itemref idref="chapter2-notes" linear="no"/>
    …
</spine>

A reader could now ignore these sections and continue following the primary narrative uninterrupted. But this capability is only a simple measure for distinguishing content that is primary at the macro level; it’s not effective in terms of distinguishing the primary narrative flow of the content within any document. (Although in the case of simple works of fiction that contain only a single unbroken narrative, it might be.)

Sighted readers don’t typically think about the logical reading order within the chapters and sections of a book, but that’s because they can visually identify the secondary content and read around it as desired. A reading system, however, doesn’t have this information to use for the same effect unless you add it (those semantics, again).

As I touched on in keeping style separate from content, you can, for example, give a sidebar a nice colorful border and offset it from the narrative visually using a div and CSS, but you’ve limited the information you’re providing to only a select group when all you use is style. Using a div instead of an aside element means a reading system will not know by default that it can skip the sidebar if the reader has chosen to only follow the primary narrative.

For someone listening to the book using a text-to-speech engine, the narrative will be interrupted and playback of the sidebar div will be initiated when you mis-tag content in this way. The only solution at the reader’s disposal might be to slowly move forward until they find the next paragraph that sounds like a continuation of what they were just listening to (div elements aren’t always escapable). Picture trying to read and keep a thought with the constant interruptions that can result from sidebars, notes, warnings and all the various other peripheral text material a book might contain.

For this reason, you need to make sure to properly identify content that is not part of the primary narrative as such. The aside element is particularly useful when it comes to marking text that is not of primary importance, but even seemingly small steps like putting all images and figures in figure tags allows the reader to decide what additional information they want presented. I’ll be returning to how to tag many of these as we go, too.

The EPUB 3 Structural Semantics Vocabulary is also a useful reference when it comes to which semantics and elements to apply to a given structure. Each of the semantics defined in this vocabulary indicates what HTML element(s) it is intended to be used in conjunction with.

Sections and Headings

As I touched on in the introduction to this section, always group related content that is structurally significant in section elements to facilitate navigation, and always indicate why you’ve created the grouping using the epub:type attribute:

<section epub:type="epilogue">
    …
</section>

The entries in the table of contents in your navigation document are all going to be structurally significant, which can be a helpful guide when it comes to thinking about how to properly apply the section element. Some additional ideas on structural significance can be gleaned from the terms in the EPUB 3 Structural Semantics Vocabulary. For example, a non-exhaustive list of semantics for sectioning content includes:

foreword
prologue
preface
part
chapter
epilogue
bibliography
glossary
index

Semantics are especially helpful when a section does not have a heading. Sighted readers are used to the visual conventions that distinguish dedications, epigraphs, and other front matter that may be only of slight interest, for example, and can skip past them. Someone who can’t see your content has to listen to it if you don’t provide any additional information to assist them.

Headingless, unidentified content also means the person will have to listen to it long enough to figure out why it’s even there. Have you just added an epigraph to the start of your book, and skipping the containing section will take them to the first chapter, or are they listening to an epigraph that starts the first chapter and skipping the section will take them to chapter two? These are the impediments you shift onto your reader when you don’t take care of your data.

When the section does contain a heading, there are two options for tagging: numbered headings that reflect the current level or h1 headings for every section. At this point in time, using numbered headings is recommended, as support for navigation via the structure of the document is still developing:

<section epub:type="part">
    <h1>Part I</h1>

    <section epub:type="chapter">
        <h2>Chapter 1</h2>
        …
    </section>
</section>

Numbered headings will also work better for forward-compatibility with older EPUB reading systems.

Using an h1 heading regardless of the nesting level of the section will undoubtedly gain traction moving forward, though. In this case, the h1 becomes more of a generic heading, as traversal of the document will occur via the document outline and not by heading tags (the construction of this outline is defined in HTML5). There is only limited support for this method of navigation at this time, however.

And remember that titles are an integral unit of information. If you want to break a title across multiple lines, change font sizes, or do other stylistic trickery, use spans and CSS and keep the display in the style layer. Never add multiple heading elements for each segment. Use span elements if you need to visually change the look and appearance of headings.

To break a heading across lines, we could use this markup:

<h1>Chapter <span class="chapNum">One</span> Loomings.</h1>

and then add the following CSS class to change the font size of the span and treat it as a block element (i.e., place the text on a separate line):

span.chapNum {
    display: block;
    margin: 0.5em 0em;
    font-size: 80%
}

If you fragment your data, you fragment the reading experience and cause confusion for someone trying to piece back together what heading(s) they’ve actually run into.

Context Changes

A nasty side-effect of current print-based export processes is that changes in context are visually styled either using CSS and/or with images. When you use the CSS margin-top property to add spacing, you’re taking away from anyone who can’t see the display that a change in context has occurred. Graphics to add whitespace are no better, since they don’t typically specify an alt value and are ignored by accessible technologies. Graphics that include asterisms or similar as the alt text are slightly better, but are still a suboptimal approach in that they don’t convey any meaning except through the reading of the alt value.

There are people who would argue that context breaks represent the borders between untitled subsections within sections, but from a structural and navigational perspective it’s typically not true or wanted, so don’t be too tempted to add section elements.

HTML5 has, in fact, addressed this need for a transitioning element by changing the semantics of the hr element for this purpose:

<p>… the world swam and disappeared into darkness.</p>

<hr class="transition"/>

<p class="nonindent">When next we met …</p>

By default this tag would include a horizontal rule, but you can use CSS to turn off the effect and leave a more traditional space for visual viewing:

hr.transition {
    width: 0em;
    margin: 0.5em 0em;
}

or you could add a fleuron or other ornament:

hr.transition {
    background: url('img/fleuron.gif') no-repeat 50% 50%;
    height: 1em;
    margin: 0.5em 0em;
}

Styling the hr element ensures that the context change isn’t lost in the rush to be visually appealing.

Lists

You’d typically not expect to have to hear the advice that you should use lists for sets of related items, but rely too heavily on print tools to create your content and the result will be paragraphs made to look like list items, or single paragraphs that merge all the items together using only br tags to separate them.

If you don’t use proper list structures, readers can get stuck having to traverse the entire set of items before they can continue with the narrative flow (in the case of one paragraph per item) or having to listen to every item in full to hear the list (when br tags are used).

A list element, on the other hand, provides the ability both to move quickly from item to item and to escape the list entirely. It also allows a reading system to inform a reader how many items are in the list and which one they are at for referencing. Picture a list with tens or hundreds of items and you’ll get a sense for why this functionality is critical.

Using paragraphs for lists also leads people to resort to visual trickery with margins to emulate the deeper indentation that a nested list would have. These kinds of illusions take away from all but sighted readers that there exists a hierarchical relationship. The correct tagging allows readers to navigate the various levels with ease.

A final note is to always use the right kind of list:

the ol element is used when the order of the items is important.
the ul element is used when there is no significance or weak significance to the items (e.g., just because you arrange items alphabetically does not impart meaning to the order).
the dl element is used to define terms, mark up glossaries, etc.

Lists have these semantics for good purpose, so don’t use CSS to play visual games with them.

Tables

The reflowable, paginated nature of ebook reading has fortunately kept tables from being used for presentational purposes in ebooks. In theory, this should have been a good thing. The complex nature of tables relative to limited rendering area of typical reading systems has led to the worse practice of excluding the data in favor of images of the table, however. How helpful is a picture of data to someone who cannot see it?

The motivating hope behind this practice seems to be that images will take away rendering issues on small screens, but don’t fall into this trap. Not only are you taking the content away from readers who can’t see the table, but even if you can see the images they often get scaled down to illegibility and/or burst out the side of the reading area on the devices that this technique is presumably meant to enhance the tables on (notably eInk readers that have no zooming functionality).

Consider also what you’re doing when you add a picture: you’re trying to address a situational disability (the inability to view an entire table at once) by creating another disability (only limited visual access to the content). If you properly mark up your data, readers can find ways to navigate it, whether via synthetic speech or other accessible navigation mechanisms. Obsessing about appearance is natural, but ask yourself how realistic a concern it should be when people read on cellphone screens? Give your readers credit to understand the limitations their devices impose, and give them the flexibility to find other ways to read.

When it comes to marking up tables, the fundamental advice for making them accessible from web iterations past remains true:

Always use th elements for header cells.
Wrap your header in a thead, in particular when including multi-row headings.
Use the th scope attribute to give the applicability of the heading (e.g., whether to the row or column). This attribute is not necessary for simple tables where the first row of th elements, or a th cell at the start of each row, defines the header(s), however.
If the header for a cell cannot be easily determined by its context, and especially when multiple cells in a multi-row header apply, add the headers attribute and point to the appropriate th element(s).

These heading requirements allow a person navigating your table to quickly determine what they’re reading at any given point in it, which is the biggest challenge that tables pose outside of perhaps escaping from them. It’s easy to get lost in a sea of numbers, otherwise.

The following example shows how these practices could be applied to a table of baseball statistics:

<table>
    <caption>1927 New York Yankees</caption>
    <thead>
        <tr>
            <th rowspan="2">Player</th>
            <th id="reg-hd" colspan="3">Regular Season</th>
            <th id="post-hd" colspan="3">Post Season</th>
        </tr>
        <tr>
            <th id="reg-ab">At Bats</th>
            <th id="reg-hits">Hits</th>
            <th id="reg-avg">Average</th>
            <th id="post-ab">At Bats</th>
            <th id="post-hits">Hits</th>
            <th id="post-avg">Average</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Lou Gehrig</td>
            <td headers="reg-hd reg-ab">584</td>
            <td headers="reg-hd reg-hits">218</td>
            <td headers="reg-hd reg-avg">.373</td>
            <td headers="post-hd post-ab">13</td>
            <td headers="post-hd post-hits">4</td>
            <td headers="post-hd post-avg">.308</td>
        </tr>
    </tbody>
</table>

The headers attribute on the td cells identifies both whether the cell contains a “Regular Season” or “Post Season” statistic as well as the particular kind of stat from the second header row. The value of this tagging is that a reading system or assistive technology can now announce to the reader that they are looking at “regular season hits” when presented the data for the third column, for example.

There’s also no reason why this functionality can’t be equally useful to sighted readers, except that it’s rarely made available. We just talked about the problem of visually rendering table data on small screens, and there’s an obvious solution here to the problem a sighted reader will have of seeing perhaps only a few cells at a time and not having the visual context of what they’re looking at. But whether mainstream devices begin taking advantage of this information to solve these problems remains to be seen.

It’s also good practice to provide a summary of complex tables to orient readers to their structure and purpose in advance, but the summary attribute has been dropped from HTML5. This loss is slightly less objectionable than the longdesc attribute removal we’ll touch on when we get to images, as prose attributes have many limitations—from expressivity to international language support.

The problem is that HTML5 doesn’t replace these removals with any mechanism(s) to allow the discovery of the same information, instead deferring to the aria-describedby attribute to point to the information (see the scripting section for more on WAI-ARIA). This attribute, however, may make the information even less generally discoverable to the broader accessibility community, as only persons using accessible technologies will easily find it.

The proposed HTML5 solutions for adding summaries, like using the caption element, also don’t take into account the need to predictably find this information before presenting the table. The information can’t be in any of a number of different places with the onus on the person reading the content to find it.

But throwing our collective hands up in the air isn’t a viable solution, either. The details element could work as a non-intrusive mechanism for including descriptions, at least until a better solution comes along. This element functions like a declarative show/hide box. Unfortunately, it suffers from a lack of semantic information that the epub:type attribute cannot currently remedy (i.e., there are no terms available for identifying whether the element contains a summary or description or something else). We instead have to use a child summary element to carry a prose title, as in the following example:

<details>
    <summary>Summary</summary>
    <p>…</p>
</details>

(The value of the summary element represents the clickable text used to expand/close the field and can be whatever you choose.)

If we then take a small liberty with the meaning of the aria-describedby attribute to also include summary descriptions, we could reformulate the HTML5 specification example to include an explicit pointer to the details element:

<table aria-describedby="tbl01-summary">
    <caption>
        Characteristics with positive and negative sides.
        <details id="tbl01-summary">
            <summary>Summary</summary>
            <p>Characteristics are given in the second column…</p>
        </details>
    </caption>
    …
</table>

In this markup, a nonvisual reader can now find the summary when encountering the table, while a sighted reader will only be presented the option of whether to expand the details element. It may not prove a great solution in the long run, but until the landscape settles it’s the best on offer.

Figures

Coming up for a quick breath of fresh air before descending into another accessibility attribute pain point, HTML5 introduces the handy new figure element for encapsulating content associated with an image, table, or code example. Grouping related content elements together, as is becoming an old theme now, makes it simpler for a reader to navigate and understand your content:

<figure>
    <img src="images/blob.jpeg" alt="the blob"/>
    <figcaption>
        Figure 3.7 &#x2014; The blob is digesting Steve McQueen in this
        unreleased ending to the classic movie.
    </figcaption>
</figure>

Unfortunately, there is little support for these two new elements at this time, so they get treated as no better than div elements. That said, it’s still preferable to future-proof your data and do the right thing, as support will catch up, especially since the only other alternative is semantically meaningless div elements.

Images

Images present a challenge for a variety of disabilities, and the means of handling them are not new, but HTML5 has added a new barrier in taking away the longdesc attribute for out-of-band descriptions. Like I talked about for tables, you’re now left to find ways to incorporate your accessible descriptions in the content of your document.

If only to keep consistent with the earlier suggestion for tables, wrapping the img element in a figure and using a details element as a child of the figcaption may suit your needs, as shown in the following example:

<figure aria-describedby="fig01-desc">
    <img src="images/blob.jpeg" alt="the blob"/>
    <figcaption>
        Figure 3.7 — The blob is digesting Steve McQueen in
        this unreleased ending to the classic movie.
        <details id="fig01-desc">
            <summary>Description</summary>
            <p>
                In the photo, Steve McQueen can be seen floating within the
                gelatinous body of the blob as it moves down the main
                street …
            </p>
        </details>
    </figcaption>
</figure>

Another option is to include a hyperlinked text label to your long description:

<figure>
    <p><a href="blob-desc.xhtml">Description</a></p>
    <img src="images/blob.jpeg" alt="the blob"/>
    <figcaption>
        Figure 3.7 — The blob is digesting Steve McQueen in this
        unreleased ending to the classic movie.
    </figcaption>
</figure>

which would allow the accessible description to live external to the content. You’ll notice I haven’t added an aria-describedby attribute to this example because only the prose of the associated element gets presented to a reader using an assistive technology. In this case, the word “Description” would be announced, but the reader would not be presented with the option to link to the description.

Continuing to make the case for longdesc, or a better equivalent alternative, is the best course of action, however.

But that muckiness aside, it’s much more pleasant to note that the alt attribute has not changed, even if confusion around its use still abounds. The alt attribute is not a short description; it’s intended to provide a text equivalent that can replace the image for people for whom the image is not accessible.

Best practices for writing the alternative text extend beyond what we can realistically cover in a guide about EPUB 3, and resources can be easily located on the Web if you’re not clear about the distinction between an alt text and description. A good free reference written by Jukka Korpela is available at http://www.cs.tut.fi/~jkorpela/html/alt.html

Of particular note for accessible practices, however, is that even though the alt attribute always has to be present on images, it does not always have to contain a text alternative:

<img src="rounded-corner.jpg" alt=""/>

This little fact often gets overlooked. If you add text to an alt attribute, you’re indicating that the image is meaningful to the content and requesting that the reader pay attention to it. Images that only exist to make content look pretty should include empty alt attributes, as that allows reading systems and assistive technologies to skip readers past them without interrupting their reading experience.

SVG

Rounding out the tour of image functionality is SVG. It comes up for debate every so often just how accessible SVG really is, and while you can argue that it can be more accessible than non-XML formats like JPEG and PNG, there’s no blanket statement like “SVG is completely accessible” that can be applied. Like all content, an SVG is only as accessible as you make it, and when you start scripting one, for example, you can fall into all the typical inaccessibility traps.

The advantages of SVG for accessibility are noteworthy, though. You can scale SVG images without the need for specialized zoom software (and without the typical pixelation effect that occurs when zooming raster formats), the images are accessible technology-friendly when it comes to scripting and can be augmented by WAI-ARIA, and you can add a title and a description directly to the markup without resorting to the messy techniques the img element requires:

<svg:svg xmlns:svg="http://www.w3.org/2000/svg">
    <svg:title>Figure 1.1, The Hydrologic Cycle</svg:title>
    <svg:desc>
        The diagram shows the processes of evaporation, condensation,
        evapotranspiration, water storage in ice and snow, and
        precipitation. …
    </svg:desc>
    …
</svg:svg>

Note that the SVG working group also provides a guide to making accessible SVGs that should also be consulted when creating content: http://www.w3.org/TR/SVG-access/

The accessibility hooks are also why SVG has been promoted up to a first-class content format (i.e., your ebook can contain only SVG images; they don’t have to be embedded in XHTML files). But if you are going to go with an image-only ebook, the quality of your descriptions is going to be paramount, as they will have to tell the story that is lost in your visual imagery. And to be frank, sometimes descriptions will simply fail to capture the richness and complexity of your content, in which case fallback text serializations should be considered.

MathML

Why is MathML important for accessibility? Consider the following simple description of an equation: the square root of a over b. If you hastily added this description to an image of the corresponding equation, what would you expect a reader who couldn’t see your image to make of it? Did you mean they should take the square root of a and divide that by b, or did you mean for them to take the square root of the result of dividing a by b?

The lack of MathML support until now has resulted in these kinds of ambiguities arising in the natural language descriptions that accompanied math images. Ideally your author would describe all their formulas, but the ability to write an equation doesn’t always translate into the ability to effectively describe it for someone who can’t see it. And sometimes you have to make do with the resources you have available at hand at the time you generate the ebook, and lacking both academic and description expertise is a recipe for disaster.

MathML takes the ambiguity out of the equation, as assistive technologies have come a long way in terms of being able to voice math equations now. There are even Word plugins that can enable authors to visually create equations for you without having to know MathML, and tools that can convert LaTeX to MathML. The resources are out there to support MathML workflows, in other words.

But although EPUB 3 now provides native support for MathML, it is still a good practice to include an alternate text fallback using the alttext attribute, as not all reading systems will support voicing of the markup:

<m:math
    xmlns:m="http://www.w3.org/1998/Math/MathML"
    alttext="Frac Root a EndRoot Over b EndFrac">
    <m:mfrac>
        <m:msqrt>
            <m:mtext>a</m:mtext>
        </m:msqrt>
        <m:mi>b</m:mi>
    </m:mfrac>
</m:math>

If the equation cannot be described within an attribute (e.g., it would surpass the 255 character limit, requires markup elements, like ruby, to fully describe, etc.), it is recommended that the description be written in XHTML and embedded in an annotation-xml element as follows:

<m:math xmlns:m="http://www.w3.org/1998/Math/MathML">
    <m:semantics>
        <m:mfrac>
            …
        </m:mfrac>
        <m:annotation-xml
            encoding="application/xhtml+xml"
            name="alternate-representation">
            <span xmlns="http://www.w3.org/1999/xhtml">
                Frac Root a EndRoot Over b EndFrac
            </span>
        </m:annotation-xml>
    </m:semantics>
</m:math>

Note that a semantics element now surrounds the entire equation. This element is required in order for the addition of the annotation-xml element to be valid.

Footnotes

Footnotes present another challenge to reading enjoyment. Prior to EPUB 3, note references could not be easily distinguished from regular hyperlinks, and the notes themselves were typically marked up using paragraphs and divs, which impeded the ability to skip through them or past them entirely.

Picture yourself in a position where you might have to skip a note or two before you can continue reading after every paragraph. And having to manually listen to each new paragraph to determine if it’s a note or a continuation of the text. The practice of clumping all notes at the end of a section is slightly more helpful, but still interferes with the content flow however you read.

The epub:type attribute helps solve both these problems when used with the new HTML aside element, as in the following example:

<p>…<a epub:type="noteref" href="#n1">1</a> …</p>

<aside epub:type="footnote" id="n1">
    …
</aside>

The “noteref” term in the epub:type attribute identifies that the link points to a note, which allows a reading system to alert the reader they’ve encountered a footnote reference. It also provides the reader the ability to tell the reading system to ignore all such links if she wants to read the text through uninterrupted. Don’t underestimate the irritation factor of constant note links being announced!

Likewise, the aside element has also been identified as a footnote, permitting the reading system to skip it if the reader has chosen to turn off footnote playback. Putting the note in an aside also indicates that the content is not part of the main document flow.

But footnotes are often a nuisance for all readers; sighted readers typically care just as little to encounter them in the text. Identifying all your notes could also allow sighted readers to automatically hide them if they prefer them to not be rendered, saving sometimes limited screen space for the narrative prose. A configurable reading system that lets you decide what content you want to see is within reach with semantically meaningful data.

Page Numbering

It might seem odd to talk about page numbering in a digital format guide, but ebooks have been used by students the world over for more than a decade to facilitate their learning in a world only just weaning itself off print. Picture yourself using an ebook in a classroom where print books are still used. When the professor instructs everyone to open their book to a specific page, your ebook will be most unhelpful if you can’t find the same location. Or think about trying to quote a passage from a novel in your final paper and not being able to indicate where in the print source it came from. Page numbers are not an antiquated concept quite yet.

The practice to date has been to include page numbers using anchor tags, as in the following example:

<a name="page361"/>

But unless a reading system does a heuristic inspection of the name attribute’s value to see if it starts with “page” or “pg” or “p” there’s not going to be a lot of value to this kind of tagging for readers. These kinds of anchor points did give a location for navigating from the NCX page list, and it did keep the number from being rendered, but it’s also lost data.

EPUB 3 once again calls on the epub:type attribute to include better semantics:

<span id="page361" epub:type="pagenumber">361</span>

It’s now clearly stated what the span contains, and the page number no longer has to be extracted from an attribute and separated from a page identifier label. It’s now up to the reader and their reading system to determine when and how to render this information, if at all.

One note when you do include page numbering is to remember that you should also include the ISBN of the source it came from in the package metadata:

<dc:source>urn:isbn:9780375704024</dc:source>

Inclusion of the ISBN is recommended as it can be used to distinguish between hardcover and softcover versions, and between different editions, of the source book. All of these typically would have different pagination, which would affect the ability of the reader to accurately synchronize with the print source in use.

This will ensure that students, teachers, professors, and other interested parties can verify whether the digital edition matches the course criteria. Of course, the ideal day coming will be when everyone is using digital editions and sharing bookmarks—and maybe even auto-synchronizing with the professor’s edition.

But there are also other settings beyond educational where page number can be useful, too. Reading is also a social activity, and being able to reference by page numbers in leisure books allows for easier participation in reading groups, for example.

The world isn’t completely digital yet, so don’t dismiss out of hand the need for print-digital referencing when you’re producing both formats for a book.