Slavic Forest, Norwegian Wood (models)

PID

Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on CS data, the HR model on SL data, and the SV model on a concatenation of DA and NO data. The scripts and commands used to create the models are part of separate submission (http://hdl.handle.net/11234/1-1970). The models were trained with UDPipe version 3e65d69 from 3rd Jan 2017, obtained from https://github.com/ufal/udpipe -- their functionality with newer or older versions of UDPipe is not guaranteed.

We list here the Bash command sequences that can be used to reproduce our results submitted to VarDial 2017. The input files must be in CoNLLU format. The models only use the form, UPOS, and Universal Features fields (SK only uses the form). You must have UDPipe installed. The feats2FEAT.py script, which prunes the universal features, is bundled with this submission.

SK -- tag and parse with the model:

udpipe --tag --parse sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu

A slightly better after-deadline model (sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe), which we mention in the accompanying paper, is also included. It is applied in the same way (udpipe --tag --parse sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu).

HR -- prune the Features to keep only Case and parse with the model:

python3 feats2FEAT.py Case tmp

udpipe --tag no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe no-ud-predPoS-test.conllu | cut -f5- | paste tmp - | sed 's/^\t$//' | udpipe --parse no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe

Identifier
PID http://hdl.handle.net/11234/1-1971
Related Identifier http://web.science.mq.edu.au/~smalmasi/vardial4/pdf/VarDial26.pdf
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-1971
Provenance
Creator Rosa, Rudolf; Zeman, Daniel; Mareček, David; Žabokrtský, Zdeněk
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2017
Funding Reference info:eu-repo/grantAgreement/EC/H2020/644402
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); http://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Slovak; Croatian; Norwegian
Resource Type toolService
Format application/octet-stream; downloadable_files_count: 5
Discipline Linguistics