Functional Requirements for an Accessible Streaming Media Technology
Multi-media, as its name implies, is content that deploys multiple streams of information delivery simultaneously, with the total of those streams delivering the "complete" information package to the end user. However, when one or more sensory receptors are not present or functioning, ensuring that the "complete" package is none-the-less delivered becomes a challenge to both the content author, as well as the end user. Content authors are both obligated as well as wise to ensure that alternative methods of key information delivery is present to address any shortcomings users may experience.
The W3C (World Wide Web Consortium), through their Web Accessibility Initiative, have identified a number of "Accessibility Guidelines" that should be considered when seeking to distribute multi-media content via the Internet (or other electronic network). These Guidelines (from the 1999 Web Content Accessibility Guidelines - http://www.w3.org/TR/WCAG10/full-checklist.html) follow:
Key Guidelines:
- 1.1 Provide a text equivalent for every non-text element (e.g., via "alt", "longdesc", or in element content). This includes: images, graphical representations of text (including symbols), image map regions, animations (e.g., animated GIFs), applets and programmatic objects, ascii art, frames, scripts, images used as list bullets, spacers, graphical buttons, sounds (played with or without user interaction), stand-alone audio files, audio tracks of video, and video.
- 1.3 Until user agents can automatically read aloud the text equivalent of a visual track, provide an auditory description of the important information of the visual track of a multimedia presentation.
- 1.4 For any time-based multimedia presentation (e.g., a movie or animation), synchronize equivalent alternatives (e.g., captions or auditory descriptions of the visual track) with the presentation.
Concurrent with the W3C Guidelines, the US Government's Section 508 Requirements (http://section508.gov/index.cfm?FuseAction=Content&ID=12#Web) also state:
- A text equivalent for every non-text element shall be provided (e.g., via "alt", "longdesc", or in element content).
- Equivalent alternatives for any multimedia presentation shall be synchronized with the presentation.
Solution vendors seeking to do business with the Federal government, or any other entity that draws funding from public sources (tax-payer resources) are generally covered by Section 508 requirements.
Explanations of W3C Guidelines:
- Guideline 1.1:
- When multi-media is embedded into a web page, not all users or user-agents (aka browsers) may be able to "see" or otherwise interact with your media file. From within the source code, a text alternative should therefore be provided, so that these users at least know/understand what it is that they are not having access to.
- Guideline 1.3:
- A text-based "dialogue" of key actions or interactions of the media (usually video) should be provided. For example:
…The camera pans from the announcers face to the black smoke emerging from the smoke-stack directly behind him. The camera then pans to the village beside the smoke-stack, showing how the pollution pattern falls directly over the village.
While minutia details are far from critical, a "general understanding" dialogue allows non-visual users the means of visualizing the scene being played out on screen. Ideally synchronized with the streamed media, a separate, static text file meets the minimal functional requirements of this requirement. Descriptive narrative files should be named or labeled in such a way as to be clearly associated with the media file in question. - Guideline 1.4:
- A text based transcript of the on-screen dialogue is critical for non-hearing users, or users who for whatever reason cannot access the audio stream. Often referred to as closed captioning or open captioning, most people are familiar with this concept due to the fact that Television transmissions are currently obligated by US law to include captioning. (Closed captioning can be toggled on or off by the end user, whereas Open captioning is present for all users). Ideally this text transcript would also be synchronized with the streaming media file; however once again, a static text file meets the minimal functional requirements of this requirement. Audio Captioning files should be named or labeled in such a way as to be clearly associated with the media file in question.
Observations:
There are currently a number of competing solutions to the issue of captioned video. Deploying a W3C endorsed technology (SMIL Synchronized Multimedia Integration Language), many of the current on-line media solutions have their own "version" of SMIL - sadly the differences between the media players implementation of SMIL requires that each media player has it's own SMIL - SMIL Files today are not inter-operable. QuickTime, Real, Windows Media and Flash all currently have captioning capacity.
At this writing (July 2007), there are currently no commercial solutions that allow for a "triple stream" of information; the above named media players can support a media stream and one text stream, but none can support a media stream and 2 text files (the caption file and the descriptive dialogue file); as a result, finding descriptive texts for media files on the internet today is nearly impossible - while the requirement exists, there are no widespread workable solutions to date. (It should be noted that some interesting experiments I have witnessed indicate that there is a feasibility of this, however it has not yet emerged).
A key benefit to content creators who ensure that the required text files are generated at the same time as the media is that these text files are infinitely more searchable and "index-able" than media files alone - while media files do allow for a modicum of meta-data for this purpose, a full text transcript provides a depth of searchable content (in context) alone far greater than meta-data could ever hope to achieve. As Vincent Flanders of "Web Pages that Suck" fame stated:
"…why should I go to the trouble of making my site accessible?" Why? Because the most powerful Internet force known to God and man visits your web pages like blind people -- Google. Google is blind and reads your sites linearly -- as the code is sent to the browser -- and then tries to interpret what it "sees". Google doesn't care how pretty your page looks: Google cares about content."
Recommendations:
Any multi-media storage system should have as part of its solution a data-base scheme and storage capacity for a minimum of three files per media instance: the media file itself, the text caption file, and text description file. An ideal server solution would further facilitate the simultaneous delivery of all or some of these associated files, ideally configured/as-demanded by the end user.
If "index" directory pages are generated dynamically by these solutions, individual, unique links to each associated file should also be provided, allowing maximum choice to the end user. Given the lack of inter-operability of current SMIL solutions, the media player (and associated MIME Type info) should also be clearly indicted to the end user. For example:
| Name: | Media File
|
Transcript | Descriptive Narrative | File Format |
|---|---|---|---|---|
| Guest lecturer Mr. Smith on Multi-Media in the Educational Environment | Smith.mov (cc) |
Smith_trans.txt | Smith_desc.txt | QuickTime / SMIL |
| Instructional Movie: How to build a Widget | Widget.rm (nc) |
Widget_trans.txt | Widget_desc.txt | RealMedia |
In the above fictional example, the "How to build a Widget" example might be legacy content that has not yet (or cannot/will not be) returned to post-production to produce the time-stamped text files that SMIL technology is structured around - none-the-less Audio Transcripts and Descriptive narrative files can (should?) be provided to end users: this is not an ideal solution, but at a minimum provides an equivalent if not identical experience. The fundamental goal is to ensure that the key information provided via the media resource is made accessible to all users regardless of any possible barrier that may exist - physiological or technical.
