CoNLL 2018 Shared Task - UDPipe Baseline Models and Supplementary Materials

PID

Baseline UDPipe models for CoNLL 2018 Shared Task in UD Parsing, and supplementary material.

The models require UDPipe version at least 1.2 and are evaluated using the official evaluation script. The models were trained using a custom data split for treebanks where no development data is provided. Also, we trained an additional "Mixed" model, which uses 200 sentences from every training data. All information needed to replicate the model training (hyperparameters, modified train-dev split, and pre-computed word embeddings for the parser) are included in the archive.

Additionaly, we provide UD 2.2 CoNLL 2018 training data with automatically predicted morphology. We utilize the baseline models on development data and perform 10-fold jack-knifing (each fold is predicted with a model trained on the rest of the folds) on the training data.

Identifier
PID http://hdl.handle.net/11234/1-2859
Related Identifier http://ufal.mff.cuni.cz/udpipe
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-2859
Provenance
Creator Straka, Milan
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2018
Rights Licence Universal Dependencies v2.2; https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Multiple languages
Resource Type languageDescription
Format text/plain; charset=utf-8; application/x-xz; downloadable_files_count: 2
Discipline Linguistics