CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data

PID

CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies

This package contains the test data in the form in which they ware presented to the participating systems: raw text files and files preprocessed by UDPipe. The metadata.json files contain lists of files to process and to output; README files in the respective folders describe the syntax of metadata.json.

For full training, development and gold standard test data, see Universal Dependencies 2.0 (CoNLL 2017) Universal Dependencies 2.2 (CoNLL 2018) See the download links at http://universaldependencies.org/.

For more information on the shared tasks, see http://universaldependencies.org/conll17/ http://universaldependencies.org/conll18/

Contents:

conll17-ud-test-2017-05-09 ... CoNLL 2017 test data conll18-ud-test-2018-05-06 ... CoNLL 2018 test data conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata and filenames modified so that it is digestible by the 2017 systems.

Identifier
PID http://hdl.handle.net/11234/1-2899
Related Identifier http://universaldependencies.org/conll18/
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-2899
Provenance
Creator Zeman, Daniel; Straka, Milan
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2018
Rights Licence Universal Dependencies v2.2; https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Afrikaans; Arabic; Breton; Bulgarian; Catalan; Valencian; Czech; Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic; Danish; German; Greek, Modern (1453-); Greek; English; Estonian; Basque; Faroese; Persian; Farsi; Finnish; French; French, Old (842-ca.1400); Irish; Galician; Gothic; Greek, Ancient (to 1453); Hebrew; Hindi; Croatian; Upper Sorbian; Hungarian; Armenian; Indonesian; Italian; Japanese; Kazakh; Korean; Latin; Latvian; Dutch; Flemish; Norwegian; Polish; Portuguese; Romanian; Moldavian; Moldovan; Russian; Slovak; Slovenian; Slovene; Northern Sami; Spanish; Castilian; Serbian; Swedish; Thai; Turkish; Uighur; Uyghur; Ukrainian; Urdu; Vietnamese; Chinese
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline Linguistics