AccessGuru dataset contains over 3,500 real-world Web accessibility violations collected from 448 diverse websites across domains such as health, education, government, news, technology, and e-commerce. Each instance is annotated with one of 112 distinct violation types, spanning syntactic, semantic, and layout categories as defined by WCAG 2.1 guidelines. The dataset includes HTML code snippets, associated metadata, and supplementary information (e.g., color values, images) necessary for detection and correction tasks. It is the first large-scale benchmark that jointly covers all three categories of accessibility violations, enabling reproducible evaluation of automated accessibility testing tools, large language models (LLMs), and assistive technologies.