Slovene translation of the SQuAD2.0 dataset

PID

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. The English version of SQuAD2.0 was machine translated to Slovene, then the translation was manually reviewed and corrected where needed. The data is provided in JSON format and consists of a training set and a validation set.

Identifier
PID http://hdl.handle.net/11356/1756
Related Identifier https://rajpurkar.github.io/SQuAD-explorer/
Related Identifier https://rsdo.slovenscina.eu/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1756
Provenance
Creator Borovič, Mladen; Žagar, Kristjan; Ferme, Marko; Majninger, Sandi; Ojsteršek, Milan; Šmajdek, Uroš; Zirkelbach, Maj; Zupanič, Matjaž; Jazbinšek, Meta; Žitnik, Slavko; Robnik-Šikonja, Marko
Publisher Faculty of Electrical Engineering and Computer Science, University of Maribor; Faculty of Computer and Information Science, University of Ljubljana; Faculty of Arts, University of Ljubljana
Publication Year 2022
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 2
Discipline Linguistics