AccessGuru

DOI

AccessGuru dataset contains over 3,500 real-world Web accessibility violations collected from 448 diverse websites across domains such as health, education, government, news, technology, and e-commerce. Each instance is annotated with one of 112 distinct violation types, spanning syntactic, semantic, and layout categories as defined by WCAG 2.1 guidelines. The dataset includes HTML code snippets, associated metadata, and supplementary information (e.g., color values, images) necessary for detection and correction tasks. It is the first large-scale benchmark that jointly covers all three categories of accessibility violations, enabling reproducible evaluation of automated accessibility testing tools, large language models (LLMs), and assistive technologies.

Identifier
DOI https://doi.org/10.18419/DARUS-5177
Related Identifier IsSupplementTo https://doi.org/10.1145/3663547.3746360
Metadata Access https://darus.uni-stuttgart.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18419/DARUS-5177
Provenance
Creator Fathallah, Nadeen (ORCID: 0000-0001-7921-034X); Hernández, Daniel ORCID logo; Staab, Steffen ORCID logo
Publisher DaRUS
Contributor Hernandez, Daniel
Publication Year 2025
Funding Reference Stuttgart Research Focus Interchange Forum for Reflection on Intelligent Systems
Rights CC BY 4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess true
Contact Hernandez, Daniel (University of Stuttgart)
Representation
Resource Type Dataset
Format application/zip; text/tab-separated-values
Size 493695002; 47254306; 20702189; 98904
Version 1.0
Discipline Other