UKP Snopes Corpus

Dataset

This corpus is based on the Snopes fact-checking website and provides annotations for training machine learning models for different tasks in the fact-checking process: document retrieval, stance detection, evidence identification and claim validation. The corpus contains 6,422 validated claims, 16,507 evidence text snippets (annotated with sentence level evidence), and 14,296 documents with their sources (URLs).

Please note: We crawled and provide the data according to the regulations of the German text and data mining policy, and we are allowed to share the corpus only for research purposes. Thus, in order to be able to download the corpus, you need to get in contact with us.

If you use the corpus in academic works, please cite our CoNLL paper.

Identifier
Source	https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2081
Related Identifier	https://github.com/UKPLab/conll2019-snopes-experiments
Related Identifier	https://github.com/UKPLab/conll2019-snopes-crawling
Metadata Access	https://tudatalib.ulb.tu-darmstadt.de/oai/openairedata?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:tudatalib.ulb.tu-darmstadt.de:tudatalib/2081

Provenance
Creator	Hanselowski, Andreas; Stab, Christian; Schulz, Claudia; Li, Zile; Gurevych, Iryna
Publisher	TU Darmstadt
Publication Year	2019
Rights	in Copyright; info:eu-repo/semantics/openAccess
OpenAccess	true
Contact	https://tudatalib.ulb.tu-darmstadt.de/page/contact

Representation
Language	English
Resource Type	Dataset
Format	application/zip
Discipline	Other