<video … controls="controls">

That these native controls vary in appearance from reading system to reading system, however, leads to a natural tendency to script custom players. There’s nothing wrong from an accessibility perspective in doing so, so long as your developers are fluent with WAI-ARIA and ensure the custom controls are fully accessible.

But if you do create a custom control set using JavaScript, ensure that you still enable the native browser controls in the audio and video elements by default. If you don’t, only readers with JavaScript-enabled systems will be able to access the audio or video content, and maybe only some of them. Depending on what scripting capabilities are available, even script-enabled systems may not render your controls. You can always disable the native controls in your JavaScript code if the system supports your custom controls.

Timed Tracks

Improved access to the content and the playback controls is only one half of the problem; your content still needs to be accessible to be useful. To this end, both the audio and video elements allow timed text tracks to be embedded using the HTML5 track element.

If you’re wondering what timed text tracks are, though, you’re probably more familiar with their practical names, like captions, subtitles, and descriptions. A timed track provides the instructions on how to synchronize text (or its rendering) with an audio or video resource: to overlay text as a video plays, to include synthesized voice descriptions, to provide signed descriptions, to allow navigation within the resource, etc.

As I touched on when talking about accessibility at the start of the guide, don’t underestimate the usefulness of subtitles and captions. They are not a niche accessibility need. There are many cases where a reader would prefer not to be bothered with the noise while reading, are reading in an environment where it would bother others to enable sound, or are unable to hear clearly or accurately what is going on because of background noise (e.g., on a subway, bus, or airplane). The irritation they will feel at having to return to the video later when they are in a more amenable environment pales next to someone who is not provided any access to that information.

It probably bears repeating at this point, too, that subtitles and captions are not the same thing, and both have important uses that necessitate their inclusion. Subtitles provide the dialogue being spoken, whether in the same language as in the video or translated, and there’s typically an assumption the reader is aware which person is speaking. Captions, however, are descriptive and provide ambient and other context useful for someone who can’t hear what else might be going on in the video in addition to the dialogue (which typically will shift location on the screen to reflect the person speaking).

A typical aside at this point would be to show a simple example of how to create one of these tracks using one of the many available technologies, but plenty of these kinds of examples abound on the Web. Understanding a bit of the technology is not a bad thing, but, similar to writing effective descriptions for images, the bigger issue is having the experience and knowledge about the target audience to create meaningful and useful captions and descriptions. These issues are outside the realm of EPUB 3, so the only advice I’ll give is if you don’t have the expertise, engage those who do. Transcription costs are probably much less than you’d expect, especially considering the small amounts of video and audio ebooks will likely include.