This document explores how testability and page-based conformance verification of the WCAG 2.0 and 2.1 accessibility guidelines are challenging to apply to a broad range of websites and web applications. It also explores approaches for mitigating these challenges, to realize as accessible a site as possible.
The challenges covered broadly fall into four main areas:
The purpose of this document is to help understand those challenges more holistically, and explore approaches to mitigating those challenges, both so that sites can use these mitigation approaches now, and also so that we can address the challenges more fully in future accessibility guidelines such as WCAG 3.0 (now in early development) where the W3C Working Group Charter expressly anticipates a new conformance model.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.nbwuij.icu/TR/.
This is a First Public Working Draft by the Accessibility Guidelines Working Group. This document explores how testability and page-based conformance verification of the WCAG accessibility guidelines are challenging to apply to a broad range of websites and web applications. It also explores approaches for mitigating these challenges, to realize as accessible a site as possible. This draft is published to obtain public review of the issues identified and solutions proposed. After sufficient review, the Working Group plans to publish this document as a Working Group Note to inform other work.
Feedback is welcome on any aspect of this document. The Working Group particularly seeks feedback on the following questions:
To comment, file an issue in the W3C WCAG GitHub repository. Please indicate your issue is for this document by using the word
Challenges: as the first word of your issue's title. Although the proposed Success Criteria in this document reference issues tracking discussion, the Working Group requests that public comments be filed as new issues, one issue per discrete comment. It is free to create a GitHub account to file issues. If filing issues in GitHub is not feasible, send email to email@example.com (comment archive).
This document was published by the Accessibility Guidelines Working Group as a First Public Working Draft.
Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 March 2019 W3C Process Document.
Assessing the accessibility of a website is of critical importance. Website authors want to have website accessibility assessments so they can understand the places where visitors with disabilities may be unable to use that site, so that they can address them as much as possible. External parties who have an interest in the accessibility of a website likewise want to have website assessments so they can understand whether the site meets their accessibility fitness criteria. To aid in this assessment, the Accessibility Guidelines Working Group (AGWG) of the World Wide Web Consortium (W3C) developed the Web Content Accessibility Guidelines (WCAG), containing both a rich set of success criteria to meet the needs of people with disabilities, as well as conformance requirements for same. Assessing conformance of a website to all of the success criteria is how accessibility assessments have been done to date, either through assessing every individual page, or through a page sampling approach.
Large websites are often highly complex, with substantial dynamic content, including content updates, new content, and user interface changes that happen almost continuously, perhaps at the rate of hundreds or even thousands of page updates per second. This is especially the case where third parties are actively populating and changing site content, such as website users contributing content. Ensuring that every one of these page updates fully satisfies all success criteria (as appropriate), especially where expert human review is required for some criteria, presents a massive scaling problem. Further, where pages are generated programmatically, finding every last bug related to that generation may prove challenging, especially when they only arise from uncommon content scenarios or combinations (and updates to those algorithms and code happen multiple times per week). Thus, the likelihood that every last page (out of what might be millions or billions of pages) can satisfy each and every success criterion 100% of the time is extremely low.
Assessing conformance of such sites to the Web Content Accessibility Guidelines (WCAG) 2.0 [wcag20] or 2.1 [wcag21] has proved difficult. The Web Content Accessibility Guidelines 2.0 list a set of normative requirements
in order for a web page to conform to WCAG 2.0, including setting forth that conformance
is for full Web page(s) only, and cannot be achieved if part of a Web page is excluded, along with a Note that states
If a page cannot conform (for example, a conformance test page or an example page), it cannot be included in the scope of conformance or in a conformance claim. The conformance requirements also set forth what is allowed in any optional
Conformance Claims, starting with text that states:
Conformance is defined only for Web pages. However, a conformance claim may be made to cover one page, a series of pages, or multiple related Web pages. For the purposes of this document, we use the term
WCAG 2.x conformance model to refer to the normative text in the Conformance section of WCAG 2.0 and WCAG 2.1.
This WCAG 2.x conformance model contains a mitigation related to partial conformance for 3rd party content (see Sec. 3.1: Treatment of 3rd party content and Statements of Partial Conformance below). Further in recognition of these challenges, the W3C Note Website Accessibility Conformance Evaluation Methodology (WCAG-EM) 1.0 [wcag-em] was published in 2014 to provide
guidance on evaluating how well websites conform to the Web Content Accessibility Guidelines. This W3C document
describes a procedure to evaluate websites and includes considerations to guide evaluators and to promote good practice, which can help organizations to make a conformance claim, while acknowledging that there may be errors on pages not in the sample set or that were not picked up by automated evaluation tools on pages that were not in the human evaluation sample. While a useful methodology for providing confidence in either a prior claim of 100% conformance across a website or as part of an internal process to help an organization assess their progress toward 100% conformance, in and of itself it doesn't address the challenges in making every last aspect of every page conform 100% to every success criterion.
Also, the Authoring Tool Accessibility Guidelines 2.0 (ATAG) [atag20]
provides guidelines for designing web content authoring tools that are both more accessible to authors with disabilities as well as
is designed to enable, support, and promote the production of more accessible web content by all authors. Leveraging authoring tools can significantly help with a number of the challenges with accessibility guidelines conformance and testing, and in future versions of this document we hope to describe those in more detail.
Further, the Research Report on Web Accessibility Metrics [accessibility-metrics-report] from the Research and Development Working Group (RDWG) explores the main qualities that website accessibility need to consider to communicate accessibility in a simple form such as a number. One particularly interesting thing this research report explored are qualities such as the severity of an accessibility barrier and the time it takes for a site visitor to conduct a task, as an alternative approach to conformance-based metrics. Unfortunately, this research report was last updated in May 2014, and has not progressed to being a published W3C Note. The Research and Development Working Group was disbanded in 2015, and the document was never advanced to contain guidance on the specific non-conformance-based qualities that should be used.
Finally, the Accessibility Conformance Testing Task Force and ACT Rules Community Group (ACT-R) are working to standardize accessibility conformance testing methodologies. They are doing it through defining a W3C specification, published as a W3C Recommendation in 2019, Accessibility Conformance Testing (ACT) Rules Format 1.0, [act-rules-format-1.0] as well as considering ways to output metrics around what the tests find. This could be very useful as an alternative to the current 100% perfect conformance model. Whenever possible, they are turning WCAG success criteria into automated test rules to enable accessibility testing on a large scale. When effective automated tests are not possible, they are writing semi-automated or manual accessibility tests that are both informative and valuable. It does not, however, speak to scaling tests that require human involvement, or the challenges of third-party content, or solve the problem of programmatically generated web pages. So, fundamentally, while they are all substantial contributions to the field of web accessibility, neither WCAG-EM, nor ATAG, nor ACT, as they stand today, are able to fully address the many challenges described in this document.
Separately, the phrase
substantially conforms to WCAG is coming into use as one way of conveying the status of a website that is broadly accessible, but not 100% perfect, given the challenges noted above and described more fully below. Unfortunately, that phrase has no W3C definition today, nor does it actually address the accessibility challenges with testing and conformance themselves.
While the challenges discussed in this document apply to websites and web applications broadly, this early version of the document focuses particularly on situations involving large, dynamic, and complex websites. There are valid reasons WCAG 2 and related resources have the conformance model they do, and the issues raised in this document do not mean sites should not strive to conform to WCAG 2. Specifically, where they can, they are invited to utilize the approaches for mitigating these challenges, as noted below, even though these mitigation approaches aren't sufficient to reach 100% conformance due to the nature of the challenges. Finally, a new version of accessibility guidelines, W3C Accessibility Guidelines 3.0 (see WCAG 3.0 Draft), rethinks all aspects of accessibility guidance in a present-day setting and is expressly chartered to develop a new conformance model that should address these challenges.
As noted above, in addition to describing in detail the challenges of assessing conformance, this document also explores some approaches to mitigating those challenges. Each of the main challenges sections below describes one or more significant mitigation approaches.
While some approaches may be more applicable to a particular website design than others; and not all approaches may be appropriate or practical for a particular website, it is likely that many websites can utilize at least some of these approaches. Website authors are encouraged to utilize as many of these approaches as possible to minimize these challenges, and maximize the likelihood that all website visitors will be able to use the site effectively. Though the challenges described in this document illustrate that it is not possible for large, complex, and/or dynamic websites to meet the 100% perfection standard of WCAG 2.x conformance, mitigation strategies might enable a substantial level of conformance that enables people with disabilities to effectively utilize websites with little difficulty.
This document has two key goals:
Silver,as well as through active discussion of conformance taking place in the Silver Task force subgroup of the W3C Accessibility Guidelines Working Group (AGWG).
A better understanding of the situations in which the WCAG 2.x conformance model may be difficult or impossible to apply, and the places where accessibility conformance verification may present difficulties in scaling, could lead to more effective conformance models and testing approaches in the future.
It is important to recognize that success criteria in WCAG 2.x are quite distinct from the conformance model. These criteria describe approaches to content accessibility that are thoughtfully designed to enable people with a broad range of disabilities to effectively consume and interact with web content. Challenges with the conformance model and testing verification doesn't mean the criteria aren't valid. For example, while requiring human judgment to validate a page limits testing to sampling of templates, flows, and top tasks, etc. (see Challenge #1 below), absent that human judgement it may not be possible to deliver a page that fully conforms to WCAG 2.x. Similarly, while it may not be possible to ensure that all third party content is fully accessible (see Challenge #3 below), absent review of that content by a human sufficiently versed in accessibility it may again not be possible to deliver pages containing third party content that fully conform to WCAG 2.x. Human judgement is a core part of much of WCAG 2.x for good reasons, and the challenges that arise from it important to successfully grapple with.
One of the reasons for publishing this draft document is to seek additional contributions from the wider web community describing any additional challenges, or further illustration of challenges in the existing identified areas below; as well as contributions to the mitigation approaches described herein, to provide further guidance for addressing these challenges. We seek to gain a thorough understanding of the challenges faced by large, complex, and dynamic websites who are attempting to provide accessible service to their web site users. It is expected that a more thorough understanding of these challenges can lead to either a new conformance model, or an alternative model that is more appropriate for large, complex, and/or dynamic websites. Ideally, such a model would also be able to distinguish between websites that are substantially accessible for most visitors with disabilities most of the time, and websites that are largely unusable by a significant portion of visitors with a disability.
This document also includes previously published research from the Silver Task Force and Community Group that was specifically related to Challenges with Accessibility Guidelines Conformance and Testing. There is some overlap between the challenges captured in this published research, and the challenges enumerated in the first 4 sections of this document. The research findings will be folded into the other sections as appropriate in future versions of this document.
Present for the first time in this current draft of this document is an initial discussion of approaches to mitigate the impact of the challenges cited. While we have heretofore emphasized systematically collecting a comprehensive inventory of challenges, we believe our collection is now sufficiently mature to begin enumerating and considering the various mitigating approaches that have come to light as a result of this work. We are publishing this document now to seek the widest possible public comment and assistance in further cataloging and characterizing both these challenges and these mitigation approaches, so that this work can become widely reviewed input into the next major revision of W3C accessibility guidelines (now chartered by W3C for eventual release as WCAG 3.0 and currently in early development under the name
The following terms are used in this document:
A challenge common to many success criteria is the inability for automatic testing to fully validate conformance and the subsequent time, cost, and expertise needed to perform the necessary manual test to cover the full range of the requirements.
HTML markup can be automatically validated to confirm that it is used according to specification, but a human is required to verify whether the HTML elements used correctly reflect the meaning of the content. For example, text on a web page marked as contained in a paragraph element may not trigger any failure in an automated test, nor would an image with alternative text equal to
red, white, and blue bird, but a human will identify that the text needs to be enclosed in a heading element to reflect the actual use on the page, and also that the proper alternative text for the image is
American Airlines logo. Many existing accessibility success criteria expect informed human evaluation to ensure that the end users benefit from conformance.
The same can be said of very large web-based applications that are developed in an agile manner with updates delivered in rapid succession, often on an hourly basis.
We can think of this as the distinction between quantitative and qualitative analysis. We know how to automatically test for and count the occurrences of relevant markup. However, we do not yet know how to automatically verify the quality of what that markup conveys to the user. In the case of adjudging appropriate quality, informed human review is still required.
There are a number of approaches for mitigating scaling
challenges. For example, if pages can be built using a small number of page
templates that are fully vetted for conformance to criteria relating to
structure, heading use, and layout, then pages generated with those templates
are much more likely to have well defined headings and structure. Further, if
pages are limited to rendering images that come from a fully vetted library of
images that have well defined ALT text, then issues with poor ALT text can be
minimized if not entirely eliminated. Another approach that can be used in
some situations is to encode website content in a higher-level abstraction
from HTML (e.g. a wiki-based website, when content authors can specify that a
particular piece of text is to be emphasized strongly [which would be rendered
block], but they cannot specify that a
particular piece of text is to be made boldface [so that
is never part of the website]). While none of these approaches can mitigate every challenge in conformance and testing with every success criterion, they are powerful approaches where applicable to help minimize accessibility issues in websites.
Appendix A describes challenges with applying the WCAG 2.x conformance model to specific Guidelines and Success Criteria, primarily based on required human involvement in evaluation of conformance to them. In this draft, the list is not exhaustive, but we intend it to cover all known challenges with all A and AA Success Criteria, by the time this Note is completed.
Large websites often have complex content publishing pipelines, which may render content dynamically depending upon a large number of variables (such as what is known about the logged in user and her content preferences, the geographical location that the user is visiting the site from, and the capabilities of the web rendering agent being used). It may not be possible to validate every possible publishing permutation with a page-level test, each of which can have an impact on whether that particular rendering of the content at that particular moment conforms.
Approaches used in quality assurance and quality engineering of software code can be used with software that generates dynamic web content. One of these is Unit Testing (or Component Testing), where each individual step in the content publishing pipeline is tested independent of the other, with a broad range of possible valid and invalid content. Another is Integration Testing, where the individual Units or Components of the software are combined in pairs, to validate their interoperability across a wide range of possible interactions. These approaches and others for quality assurance of complex software systems are effective tools for minimizing the number of bugs in the system, though there is no guarantee of finding all potential bugs, or assuring a software system is bug-free.
Very large, highly dynamic web sites generally aggregate content provided by multiple entities. Many of these are third parties with the ability to add content directly to the website — including potentially every website visitor. While the website can provide guidance on how to post content so that it meets accessibility guidance, it is ultimately up to those third parties to understand and correctly implement that guidance. And as noted above, even with automated checking prior to accepting the post, many Guidelines and Success Criteria expect human validation involvement.
Copyright and similar constraints that restrict the ability to modify or impose requirements on third party data can also make full page conformance impossible to assure, e.g. articles that allow reposting but without modification due to copyright restrictions.
WCAG 2.x speaks to 3rd party content and conformance, in the context of a Statement of Partial Conformance. [wcag21] It provides two options for such content — that pages with 3rd party content may:
based on best knowledge,for example by monitoring and repairing non-conforming content within 2 business days; or
The provision of monitoring and required repair within a 2 business day window doesn't address the underlying challenge of pages with (3rd party) content that may be updating tens or hundreds of times within that 2 day window. For large websites with hundreds of thousands of pages or more with a significant amount of 3rd party content, the necessity for human involvement in the evaluation of 3rd party content doesn't scale.
A statement of partial conformance doesn't address the underlying challenge of improving the usability of 3rd party content. While it might allow a web page/website visitor to be forewarned in advance that they should anticipate encountering inaccessible content, it does little to practically enable large sites to address such concerns.
There are several approaches to the challenge of 3rd party content accessibility. Where that content is provided as part of a commercial contract, the terms of the contract may include requirements around accessibility. Where the content comes from potentially any website visitor or user, those visitors can be constrained in the types of contributions they can make (e.g. prevented from using non-semantic text attributes like boldface), or prompted to add accessibility metadata (e.g. so that images are uploaded along with ALT text), or reminded about good accessibility practices (e.g. told not to refer to sensory characteristics in their text). Such approaches may add substantial friction, especially to individual website visitors and users, to the point that they may be disinclined to make contributions. Further, they aren't a guarantee of perfection in those contributions (Is the ALT text thoughtfully authored, or just filler text to make the upload happen? Did the text truly avoid reference to sensory characteristics?). Nonetheless, these are some approaches to consider to help minimize the amount of 3rd party content that poses accessibility challenges, which may also offer a great opportunity to help educate users and provide a teachable moment that would help make accessibility practices more ubiquitous.
The core principles, and many of the guidelines, contained in WCAG 2.x, are broadly applicable outside of the web context. For example, no matter the technology, information presented to humans needs to be perceivable by them in order for them to access and use it. At the same time, some of the specific guidelines and especially some of the individual success criteria of WCAG 2.x are written specifically for web content and web technologies, and may be difficult to apply to non-web information and communications technologies (as set forth in the W3C Note Guidance on Applying WCAG to non-web Information and Communication Technologies (WCAG2ICT)). [wcag2ict] Furthermore, the state of programmatic test tools for assessing whether non-web information and communications technologies meet various WCAG 2.x success criteria varies widely with the type of non-web document, the operating system, and the user interface toolkits used to create the non-web software. In no case that we are aware of do such tools explicitly map the accessibility issued found to specific WCAG 2.x success criteria. Therefore, it is potentially the case that for some documents or software, it will not be possible to use any programmatic accessibility evaluation tools for any success criterion — conformance to each and every success criterion will need human expertise and judgment.
There are various approaches to the challenges of accessibility guidelines conformance and testing of non-web information and communications technologies, depending upon the nature of non-web document technology or non-web software in question. For example, there is a rich set of techniques describing how to meet WCAG success criteria for Adobe PDF documents, and Adobe includes a PDF accessibility assessment tool that also helps authors mitigate accessibility failings found by the tool as part of Adobe Acrobat. Both Microsoft Word and OpenOffice include accessibility test and validation tools, and both are capable of saving exporting their content into accessible PDF documents. Operating systems like MacOS, iOS, Windows, Android, and Fire OS all have rich accessibility frameworks which can be used to make accessible software applications, which are accessible through the use of built-in assistive technologies like screen readers and screen magnifiers, as well as 3rd party assistive technologies in some cases. Further, there are a variety of accessibility test tools and support systems to help software developers make accessible software applications for these operating systems. While none of these tools and frameworks can find all accessibility issues or otherwise guarantee a document or application will be free of all accessibility defects – whether those are tied to a specific WCAG success criterion or not – they are nonetheless important tools that can lead to highly accessible documents and software.
Appendix B contains
Detailed Challenges with Conformance Verification and Testing for non-web ICT. It covers 12 success criteria out of the 38 A and AA criteria in WCAG 2.0 which can be applied to non-web ICT after replacing specific terms or phrases. In
future versions of the document we plan to address success criteria introduced in WCAG 2.1 that may pose specific challenges for conformance verification and testing in the non-web ICT context.
Now known as W3C Accessibility Guidelines (WCAG 3.0), this iteration of W3C accessibility guidance was conceived and designed to be research-based. Working over many years, the Silver Task Force of the Accessibility Guidelines Working Group (AGWG) and the Silver Community Group collaborated with researchers on questions that the Silver Groups identified. This research was used to develop 11 problem statements that needed to be solved for Silver. The detailed problem statements include the specific problem, the result of the problem, the situation and priority, and the opportunity presented by the problem. The problem statements were organized into three main areas: Usability, Conformance, and Maintenance. The section following is taken from the Conformance sections of the Silver Design Sprint Final Report and the Silver Problem Statements. Details of the research questions and the individual reports are in Research Archive of Silver wiki.
Originally published as the Silver Design Sprint Final Report (2018). These problem statements were presented to the Silver Design Sprint participants.
What is Strictly Testableprovides an obstacle to including guidance that meets the needs of people with disabilities but is not conducive to a pass/fail test.
Originally published as Silver Problem Statements, this was a detailed analysis of the research results behind the above list.
Conformance to a standard means that you meet or satisfy therequirementsof the standard. In WCAG 2.0 therequirementsare the Success Criteria. To conform to WCAG 2.0, you need to satisfy the Success Criteria, that is, there is no content which violates the Success Criteria.
WCAG 2.0 Conformance Requirements:
- Conformance Level (A to AAA)
- Conformance Scope (For full web pages only, not partial)
- Complete Process
- Only "Accessibility-supported" ways of using technologies
- Non-Interference: Technologies that are not accessibility supported can be used, as long as all the information is also available using technologies that are accessibility supported and as long as the non-accessibility-supported material does not interfere.
Reliably Human Testable,
not reliably testable(Brajnick et al., 2012) average agreement was at the 70-75% mark, while the error rate was around 29%.
accessibility supported ways of using technologies
Specific problem: Certain success criteria are quite clear and measurable, like color contrast. Others, far less so. The entire principle of understandable is critical for people with cognitive disabilities, yet success criteria intended to support the principle are not easy to test for or clear on how to measure. As a simple example, there is no clear, recent or consistent definition – within any locale or language – on what
lower secondary education level means in regard to web content. Language and text content is also not the only challenge among those with cognitive and learning disabilities. Compounding this, most of the existing criteria in support of understanding are designated as AAA, which relatively few organizations attempt to conform with.
Result of problem: The requirement for valid and reliable testability for WCAG success criteria presents a structural barrier to including the needs of people with disabilities whose needs are not strictly testable. Guidance that WCAG working group members would like to include cannot be included. The needs of people with disabilities – especially intellectual and cognitive disabilities – are not being met.
Situation and Priority: Of the 70 new success criteria proposed by the Cognitive Accessibility Task Force to support the needs of people with cognitive and intellectual disabilities, only four to six (depending on interpretation) were added to WCAG 2.1 and only one is in level AA. The remainder are in level AAA, which is rarely implemented. This means user needs are not met.
Opportunity: Multiple research projects and audience feedback have concluded that simpler language is desired and needed for audiences of the guidelines. Clear but flexible criteria with considerations for a wider spectrum of disabilities helps ensure more needs are met.
Specific problem: Regardless of proficiency, there is a significant gap in how any two human auditors will identify a success or fail of criteria. Various audiences have competing priorities when assessing the success criteria of any given digital property. Knowledge varies for accessibility standards and how people with disabilities use assistive technology tools. Ultimately, there is variance between: any two auditors; any two authors of test cases; and human bias. Some needs of people of disabilities are difficult to measure in a quantifiable way.
Result of problem: Success criteria are measured by different standards and by people who often make subjective observations. Because there's so much room for human error, an individual may believe they've met a specific conformance model when, in reality, that’s not the case. The ultimate impact is on an end user with a disability who cannot complete a given task, because the success criteria wasn’t properly identified, tested and understood.
Situation and Priority: There isn't a standardized approach to how the conformance model applies to success criteria at the organizational level and in specific test case scenarios.
Opportunity: There's an opportunity to make the success criteria more clear for human auditors and testers. Educating business leaders on how the varying levels of conformance apply to their organization may be useful as well. We can educate about the ways that people with disabilities use their assistive technology.
Accessibility supported was never fully implemented in a way that was clear and useful to developers and testers. It also requires a harmonious relationship and persistent interoperability between content technologies and requesting technologies that must be continuously evaluated as either is updated. Further, the WG
defers the judgment of how much, how many, or which AT must support a technology to the community. It is poorly understood, even by experts.
Result of problem: Among the results are: difficulty understanding what qualifies as a content technology or an assistive technology; difficulty quantifying assistive technologies or features of user agents; claiming conformance with inadequate assistive technology; and difficulty claiming conformance.
Situation and Priority: Any claim or assertion that a web page conforms to the guidelines may require an explicit statement defining which assistive technology and user agent(s) the contained technologies rely upon, and presumably inclusive of specific versions and or release dates of each. One could infer then that a conformance claim is dependent upon a software compatibility claim naming browsers and assistive technology and their respective versions. This would create a burden to author and govern such claims. Additionally, no one can predict and anticipate new technologies and their rates of adoption by people with disabilities.
Opportunity: As the technologies in this equation evolve, the interoperability may be affected by any number of factors outside of the control of the author and publisher of a web page. Either
accessibility supported should not be a
component of conformance requirements, or it should clearly, concisely and explicitly define and quantify the technologies or classes of technologies, AND set any resulting update or expiry criteria for governance.
Specific problem: Evolving Technology: As content technology evolves, it must be re-evaluated against assistive technology for compatibility. Likewise, as assistive technology evolves or emerges, it must be evaluated against the backward compatibility of various content technology.
Result of problem: There is no versioning consideration for updates to user agents and assistive technology. Strict conformance then typically has an expiry.
Situation and Priority: There is no clear and universal understanding of the conformance model or its longevity. Some will infer that there is always a conformance debt when any technology changes.
Opportunity: Consider conformance statements to include an explicit qualifier of time of release or versions of technology. OR consider a more general approach that is not explicit and is flexible to the differences in technologies as they evolve, identifying the feature of the assistive tech rather than the version of the assistive tech. OR consider a model that quantifies conformance as a degree of criteria met.
This appendix describes challenges with applying the WCAG 2.x conformance model to specific Guidelines and Success Criteria, primarily based on required human involvement in evaluation of conformance to them. In this draft, the list is not exhaustive, but we intend it to cover all known challenges with all A and AA Success Criteria, by the time this Note is completed. The purpose of this is not to critique WCAG 2 nor to imply that sites and policies should not do their best, and strive to conform to it, but rather to indicate known areas for which it may not be possible to conform, and which a new conformance model would hopefully address.
We have seen the market respond to the increased demand for accessibility professionals in part due to the amount of required human involvement in the valuation of conformance, with many international efforts such as the International Association of Accessibility Professionals (IAAP) which train and/or certify accessibility professionals. While this is resulting in downward pressure on costs of testing and remediation with more accessibility professionals becoming available to meet the need, it doesn't in and of itself eliminate the challenges noted below. Furthermore, for the near term, it appears the demand will be greater than the supply of this type of specialized expertise.
Also, the Website Accessibility Conformance Evaluation Methodology (WCAG-EM) 1.0 [wcag-em] lays out a strategy to combine human testing and automated testing. In the model, automation is used for a large number of pages (or all pages) and sampling is used for human testing. The WCAG-EM suggests that the human evaluation sample might include templates pages, component libraries, key flows (such as choosing a product and purchasing it, or signing up for a newsletter, etc.), and random pages. Although these strategies are a useful methodology for providing confidence in either a prior claim of 100% conformance across a website or as part of an internal process to help an organization assess their progress toward 100% conformance, in and of itself it doesn't address the challenges in making every last aspect of every page conform 100% to every success criterion.
Text alternatives for images are an early, and still widely used, accessibility enhancement to HTML. Yet text alternatives remain one of the more intractable accessibility guidelines to assess with automated accessibility checking. While testing for the presence of alternative text is straightforward, and a collection of specific errors (such as labeling a graphic
spacer.gif) can be identified by automated testing, human judgment remains necessary to evaluate whether or not any particular text alternative for a graphic is correct and conveys the true meaning of the image. Image recognition techniques are not mature enough to fully discern the underlying meaning of an image and the intent of the author in its inclusion. As a simple example, an image or icon of a paper clip would likely be identified by image recognition simply as a
paper clip. However, when a paper clip appears in content often its meaning is to show there is an attachment. In this specific example, the alternative text should be
paper clip. Similarly, the image of a globe (or any graphical object representing planet Earth) can be used for a multiplicity of reasons, and the appropriate alternative text should indicate the reason for that use and not descriptive wording such as
Planet Earth. One not uncommon use of a globe today expands to allow users to select their preferred language, but there may be many other reasonable uses of such an icon.
Practices for creating alternatives to spoken dialog, and to describe visual content, were established in motion picture and TV content well before the world wide web came into existence. These practices formed the basis of the Media Accessibility User Requirements (MAUR) [media-accessibility-reqs] for time-based streaming media on the web in HTML5, which now supports both captioning and descriptions of video.
Yet, just as with text alternatives, automated techniques and testing aren't sufficient for creating and validating accessible alternatives to time-based media. For example, Automatic Speech Recognition (ASR) often fails when the speech portion of the audio is low quality, isn’t clear, or has background noise or sound-effects. In addition, current automated transcript creation software doesn't perform speaker identification, meaningful sound identification, or correct punctuation that all are necessary for accurate captioning. Work on automatically generated descriptions of video are in their infancy, and like image recognition techniques, don’t provide usable alternatives to video.
Similarly, while there is well articulated guidance on how to create text transcripts or captions for audio-only media (such as radio programs and audio books), automated techniques and testing again aren't sufficient for creating and validating these accessible alternatives. Knowing what is important in an audio program to describe to someone who cannot hear is beyond the state of the art. There are several success criteria under this Guideline that all share these challenges of manual testing being required to ensure alternatives accurately reflect the content in the media. These include:
Whether in print or online, the presentation of content is often structured in a manner intended to aid comprehension. Sighted users perceive structure and relationships through various visual cues. Beyond simple sentences and paragraphs, the sighted user may see headings with nested subheadings. There may be sidebars and inset boxes of related content. Tables may be used to show data relationships. Comprehending how content is organized is a critical component of understanding the content.
As with media above, automated testing can determine the presence of structural markup, and can flag certain visual presentations as likely needing that structural markup. But such automated techniques remain unable to decipher if that markup usefully organizes the page content in a way that a user relying on assistive technology can examine the page systematically and readily understand its content.
Often the sequence in which content is presented affects its meaning. In some content there may be even more than one meaningful way of ordering that content. However, as with Info and Relationships above, automated techniques are unable to determine whether content will be presented to screen reader users in a meaningful sequence ordering. For example, the placement of a button used to add something to a virtual shopping cart is very important for screen reader users, as improper placement can lead to confusion about which item is being added.
Ensuring that no instructions rely on references to sensory characteristics presents similar challenges to ensuring that color isn't the sole indicator of meaning (Success Criterion 1.4.1) – it is testing for a negative, and requires a deep understanding of meaning conveyed by the text to discern a failure programmatically. For example, while instructions such as
select the red button reference a sensory characteristic,
select the red button which is also the first button on the screen may provide sufficient non-sensory context to not cause a problem (and multi-modal, multi-sensory guidance is often better for users with cognitive impairments or non-typical learning styles).
While an automated test can determine that the orientation is locked, full evaluation of conformance to this criterion is tied to whether it is
essential for the content to be locked to one specific orientation (e.g. portrait or landscape views of an interface rendered to a cell phone). This requires human judgment to ensure that, any time the orientation is locked, the orientation is essential to that content to determine conformance. As of yet, this requires human judgement and is not fully automatable.
An automated test can easily determine that input fields use HTML markup to indicate the input purpose, however, manual verification is needed to determine that the correct markup was used to match the intent for the field. For example, for a name input field, there are 10 variations of HTML name purpose attributes with different meaning and using the incorrect markup would be confusing to the user.
This poses the same challenges as Sensory Characteristics (Success Criterion 1.3.3). To discern whether a page fails this criterion programmatically requires understanding the full meaning of the related content on the page and whether any meaning conveyed by color is somehow also conveyed in another fashion (e.g. whether the meaning of the colors in a bar chart is conveyed in the body of associated text or with a striping/stippling pattern as well on the bars, or perhaps some other fashion).
An automated test tool would be able to identify media/audio content in a website, identify whether auto-play is turned on in the code, and also determine the duration. However, an automated test tool cannot determine whether there is a mechanism to pause, stop the audio, or adjust the volume of the audio independent of the overall system volume level. This still requires manual validation.
Automated tools can check the color of text against the background in most cases. However, there are several challenges with using current state of the art automated tools for this success criterion, including (1) when background images are used, automated tests aren't reliably able to check for minimum contrast of text against the image—especially if the image is a photograph or drawing where the text is placed over the image, and (2) situations in which depending upon context such as text becoming incidental because it is part of an inactive user interface component or is purely decorative or part of a logo. These would take human intervention to sample the text and its background to determine if the contrast meets the minimum requirement.
While automated tools can test whether it is possible to resize text on a webpage, it takes human evaluation to determine whether there has been a loss of content or functionality as a result of the text resizing.
This poses the same challenge as Orientation (Success Criterion 1.3.4) - it is tied to whether it is
essential for text to be part of an image. This requires human judgment, making this criterion not readily automatable. Additionally, methods of employing OCR on images will not accurately discern text of different fonts that overlap each other, or be able to recognize unusual characters or text with poor contrast with the background of the image.
While automated tests can detect the presence of vertical and horizontal scroll bars, there are currently no reliable tests to automate validating that there has been no loss in content or functionality. Human evaluation is also still needed to determine when two-dimensional scrolling is needed for content that requires two-dimensional layout for usage or meaning.
This success criterion requires several levels of checks that are difficult or impossible to automate as it allows for exceptions which require human intervention to examine the intent and potentially employ exceptions to comply with the guideline. Automated checks would have to include:
essentialto utilize the exception which requires human intervention.
This success criterion involves using a tool or method to modify text spacing and then checking to ensure no content is truncated or overlapping. There is currently no way to reliably automate validating that no loss of content of functionality has occurred when text spacing has been modified.
As content needs to be surfaced by providing focus using either a mouse pointer or keyboard focus, to then determine whether the following 3 criteria are met, this test currently requires human evaluation.
While an automated test can evaluate whether a page can be tabbed through in its entirety, ensuring keyboard operability of all functionality currently requires a human to manually navigate through content to ensure all interactive elements are not only in the tab order, but can be fully operated using keyboard controls.
Character key shortcuts can be applied to content via scripting but whether and what these shortcut key presses trigger can only be determined by additional human evaluation.
There is currently no easy way to automate checking whether timing is adjustable. Ways of controlling differ in naming, position, and approach (including dialogs/popups before the time-out). This can also be affected by how the server registers user interactions (e.g. for automatically extending the time-out).
Typically the requirement to control moving content is provided by interactive controls placed in the vicinity of moving content, or occasionally at the beginning of content. Since position and naming vary, this assessment cannot currently be automated (this involves checking that the function works as expected).
There are currently no known automated tests that are accurately able to assess areas of flashing on a webpage to ensure that the flashing happens less than three times per second.
While it can be determined that native elements or landmark roles are used, there is currently no automated way to determine whether they are used to adequately structure content (are they missing out on sections that should be included). The same assessment would be needed when other Techniques are used (structure by headings, skip links).
Automating a check for whether the page has a title is simple; ensuring that the title is meaningful and provides adequate context as to the purpose of the page is not currently possible.
There is currently no known way to automate ensuring that focus handling with dynamic content (e.g. moving focus to a custom dialog, keep focus in dialog, return to trigger) follows a logical order.
Automated tests can validate whether pages can be reached with multiple ways (e.g. nav and search), but will miss cases where exceptions hold (all pages can be reached from anywhere) and still require human validation.
Automated tests can detect the existence of headings and labels, however, there is currently no way to automate determining whether the heading or label provides adequate context for the content that follows.
There are currently no known automated checks that would accurately detect complex gestures - even when a script indicates the presence of particular events like touch-start, the event called would need to be checked in human evaluation.
When mouse-down events are used (this can be done automatically), checking for one of the following four options that ensure the functionality is accessible requires human evaluation:
Motion activated events may be detected automatically but whether there are equivalents for achieving the same thing with user interface components currently requires human evaluation.
There is currently no reliable way to accurately automate checking whether a change caused by moving focus should be considered a change of content or context.
There is currently no reliable way to accurately automate checking whether changing the setting of any user interface component should be considered a change of content or context, or to automatically detect whether relevant advice exists before using the component in question.
Insuring whether an error message correctly identifies and describes the error accurately and in a way that provides adequate context currently requires human evaluation.
A.35 Edge cases (labels close enough to a component to be perceived as a visible label) will require a human check. Some labels may be programmatically linked but hidden or visually separated from the element to which they are linked. Whether instructions are necessary and need to be provided will hinge on the content. Human check needed.
Whether an error suggestion is helpful or correct currently requires human evaluation.
Incorrect use of ARIA constructs can be detected automatically but constructs that appear correct may still not work, and widgets that have no ARIA (but need it to be understood) can go undetected. Human post-check of automatic checks is still necessary.
As noted in Challenge #4 Non-Web Information and
Communications Technologies above, 18 success criteria out of the 38 A and AA criteria in WCAG 2.0 could be
applied to non-web ICT only after replacing specific terms or phrases. 4 of
those 12 (2.4.1, 2.4.5, 3.2.3, and 3.2.4) related to either
a set of web
multiple web pages, which is more difficult to characterize for
non-web ICT. Another 4 are the
non-interference set (1.4.2, 2.1.2, 2.2.2,
and 2.3.1) which need further special consideration as they would apply to an
entire software application. The remaining 10 were more straightforward to
apply to non-web ICT, but still required some text changes.
Since publication of WCAG2ICT, [wcag2ict] WCAG 2.1 was published introducing a number of additional success criteria at the A and AA levels. Some of these may also pose specific challenges for conformation verification and testing in the non-web ICT context. A future version of this document will address those new success criteria in the non-Web ICT Context.
The 18 success criteria noted in WCAG2ICT are discussed below in four sections,
the last of which address the 14 of the 38 A and AA criteria in WCAG 2.0 which relate to an accessibility supported interface, which may not be possible for software running in a
closed environment (e.g. an airplane ticket kiosk).
Set of Web PagesSuccess Criteria
These four success criteria, include either the term
set of pages or
multiple pages, which in the non-web ICT context becomes either a Set of Documents or a Set of Software Programs. In either case (document or software), whether the criterion applies is dependent upon whether such a set exists, which may require human judgment. Where that set is determined to exist, it may be difficult to employ programmatic testing techniques to verify compliance with the specific criterion.
To ensure this criterion is met for non-web documents, once the set of documents is defined, every document in the set must be searched for blocks of content that are repeated across all of those documents, and a mechanism to skip those repeated blocks. Since the blocks aren't necessarily completely identical (e.g. a repeated listing of all other documents in a set might not include the document containing that list), a tool to do this may not be straightforward, and in any case, no such tool is known to exist today to do this with non-web documents.
Similarly, to ensure this criterion is met for non-web software, once the set of software is defined, every software application in the set must be searched for blocks of content that are repeated across all of those applications, and a mechanism to skip those repeated blocks. Since the blocks aren't necessarily completely identical (e.g. a repeated listing of all other software in a set might not include the software application containing that list), a tool to do this may not be straightforward, and in any case, no such tool is known to exist today to do this with non-web software.
To ensure this criterion is met for non-web documents, once the set of documents is defined, every document in the set must provide multiple mechanisms for locating every other document in the set. As noted by WCAG2ICT, if the documents are on a file system,
it may be possible to browse through the files or programs that make up a set, or search within members of the set for the names of other members. A file directory would be the equivalent of a site map for documents in a set, and a search function in a file system would be equivalent to a web search function for web pages. However, if this is not the case, then the set of documents must expose at least 2 ways of locating every other document in the set. Determining if this is the case is not possible today with any testing tool we are aware of, and so would require human inspection.
Similarly, to ensure this criterion is met for non-web software, once the set of software is defined, every software application in the set must provide multiple mechanisms for locating every other application in the set. As noted by WCAG2ICT, if the software applications are on a file system,
it may be possible to browse through the files or programs that make up a set, or search within members of the set for the names of other members. A file directory would be the equivalent of a site map for documents in a set, and a search function in a file system would be equivalent to a web search function for web pages. However, if this is not the case, then the set of software applications must expose at least 2 ways of locating every other application in the set. Determining if this is the case is not possible today with any testing tool we are aware of, and so would require human inspection.
To ensure this criterion is met for non-web documents, once the set of documents is defined, every document in the set must be searched for all of the functional components (e.g. tables, figures, graphs, indices), noting how those components are identified. Every document in that set must then be inspected to verify that where they contain the same components as every other document in the set, they are identified in a consistent fashion. Determining if this is the case is not possible today with any testing tool we are aware of, and so would require human inspection.
Similarly, to ensure this criterion is met for non-web software, once the set of software is defined, every software application in the set must be searched for all of the functional components (e.g. menus, dialog boxes, other user interface elements and patterns), noting how those components are identified. Every application in that set must then be inspected to verify that where they contain the same components as every other software application in the set, they are identified in a consistent fashion. Determining if this is the case is not possible today with any testing tool we are aware of, and so would require human inspection.
non-interference success criteria are things that apply to
of the page. As explained in WCAG2ICT in the section Comments on
it wasn't possible to unambiguously carve up software into
discrete pieces, and so the unit of evaluation for non-web software is the
whole software program. As with any software testing this can be a very large
unit of evaluation, and methods similar to standard software testing might be
used. Standard software testing employs both programmatic testing and manual
testing – automating what can be automated, and using human inspection
otherwise. In the cases below, some level of human inspection or involvement
would normally be part of the software testing strategy to verify compliance
with these four criteria.
Where non-web documents contain audio, especially audio that automatically plays in certain circumstances (e.g. a slide in a slide deck starts playing a video when that slide is shown), – this criterion is typically met through the user agent or software application or operating system the user is using to interact with the document (rather than through an affordance in the static document itself). Because of this, compliance with this success criterion may be software application or operating system dependent, and therefore difficult to assess compliance for outside of a specific, named application or operating system.
Where non-web software contains audio, especially audio that automatically plays in certain circumstances (e.g. making a ringing sound to indicate an incoming call) ...
Section content yet to be written.
Non-web documents rarely if ever include code for responding to keyboard focus. This criterion is typically met through the user agent or software application the user is using to interact with the document (rather than through an affordance in the static document itself). Because of this, compliance with this success criterion may be software application or operating system dependent, and therefore difficult to assess compliance for outside of a specific, named application or operating system. Even then, programmatic testing for this may not be possible.
Where non-web software contains a user interface that can be interacted with from a keyboard, it may be possible to test for this programmatically, though we are not aware of any such test today. Where interaction with the user interface is supported from a keyboard interface provided by an assistive technology (e.g. a Bluetooth keyboard driving a screen reader for a tablet or phone UI), programmatic testing may be especially challenging.
As with audio, where non-web documents contain animation — especially animation that automatically plays in certain circumstances (e.g. a slide in a slide deck starts an animation when that slide is shown) — this criterion is typically met through the user agent or software application the user is using to interact with the document (rather than through an affordance in the static document itself). Because of this, compliance with this success criterion may be software application dependent, and therefore difficult to assess compliance for outside of a specific, named application. Even then, programmatic testing for this may not be possible.
Where non-web software contains animation — especially audio that automatically plays in certain circumstances (e.g. showing a trailer for a movie when the user selects a movie title) — this criterion is typically met through some setting in the application to suppress such animations, or perhaps in the operating system. Because it can be difficult to tell when the animation is not desired by the user and when it is (did the user ask to play a trailer?), this may not be possible to discern programmatically.
While this success criterion may be difficult to programmatically test for in all situations (especially for software applications), there is nothing in this criterion that is otherwise challenging to apply in the non-web ICT context.
Section content yet to be written.
Section content yet to be written.
Section content yet to be written.
Section content yet to be written.
Section content yet to be written.
The purpose of this success criterion is to enable assistive technologies like screen readers to determine the language used for different passages of text on a web page. While some software environments like Java and GNOME/GTK+ support this both for text substrings within a block of text as well as for individual user interface elements, others do not. Therefore, it may not be possible for some software to meet this success criterion. Separately, programmatic testing for this may not be possible, as expert human judgment is needed to determine what the correct language is for some text passages.
Section content yet to be written.
Section content yet to be written.
Section content yet to be written.
Section content yet to be written.
15 of the 38 A and AA criteria in WCAG 2.0 relate to an accessibility supported interface — they are designed with interoperability with assistive technologies in mind. Such interaction may not be possible for many types of software (e.g.
software running in a
closed environment like an airplane ticket kiosk). Thus, in those environments, the only way to address the needs articulated in these criteria may be for the software to be
self-voicing for blind users who can
hear, and otherwise
self-accessible to the needs of people with other disabilities which are commonly supported via assistive technologies. It may not be feasible to support all disability user needs (e.g. including a refreshable
braille display in the device to support deaf-blind users, and then maintaining those braille displays to ensure their mechanisms don't get damaged).
This publication has been funded in part with U.S. Federal funds from the Health and Human Services, National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR), initially under contract number ED-OSE-10-C-0067 and now under contract number HHSP23301500054C. The content of this publication does not necessarily reflect the views or policies of the U.S. Department of Health and Human Services or the U.S. Department of Education, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.