Slovene translation of the SQuAD2.0 dataset

Dataset

PID

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. The English version of SQuAD2.0 was machine translated to Slovene, then the translation was manually reviewed and corrected where needed. The data is provided in JSON format and consists of a training set and a validation set.

Identifier
PID	http://hdl.handle.net/11356/1756
Related Identifier	https://rajpurkar.github.io/SQuAD-explorer/
Related Identifier	https://rsdo.slovenscina.eu/
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1756

Provenance
Creator	Borovič, Mladen; Žagar, Kristjan; Ferme, Marko; Majninger, Sandi; Ojsteršek, Milan; Šmajdek, Uroš; Zirkelbach, Maj; Zupanič, Matjaž; Jazbinšek, Meta; Žitnik, Slavko; Robnik-Šikonja, Marko
Publisher	Faculty of Electrical Engineering and Computer Science, University of Maribor; Faculty of Computer and Information Science, University of Ljubljana; Faculty of Arts, University of Ljubljana
Publication Year	2022
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); PUB; https://creativecommons.org/licenses/by/4.0/
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	corpus
Format	text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 2
Discipline	Linguistics