Universal Dependencies 2.0

PID

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).

This release is special in that the treebanks will be used as training/development data in the CoNLL 2017 shared task (http://universaldependencies.org/conll17/). Test data are not released, except for the few treebanks that do not take part in the shared task. 64 treebanks will be in the shared task, and they correspond to the following 45 languages: Ancient Greek, Arabic, Basque, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Kazakh, Korean, Latin, Latvian, Norwegian, Old Church Slavonic, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian, Urdu, Uyghur and Vietnamese.

This release fixes a bug in http://hdl.handle.net/11234/1-1976. Changed files: ud-tools-v2.0.tgz (conllu_to_text.pl, conllu_to_conllx.pl; added text_without_spaces.pl), ud-treebanks-conll2017.tgz (fi_ftb-ud-train.txt, he-ud-train.txt, it-ud-train.txt, pt_br-ud-train.txt, es-ud-train.txt) and ud-treebanks-v2.0.tgz (fi_ftb-ud-train.txt, he-ud-train.txt, it-ud-train.txt, pt_br-ud-train.txt, es-ud-train.txt, ar_nyuad-ud-dev.txt, ar_nyuad-ud-test.txt, ar_nyuad-ud-train.txt, cop-ud-dev.txt, cop-ud-test.txt, cop-ud-train.txt, sa-ud-dev.txt, sa-ud-test.txt, sa-ud-train.txt).

Identifier
PID http://hdl.handle.net/11234/1-1983
Related Identifier http://hdl.handle.net/11234/1-1827
Related Identifier http://hdl.handle.net/11234/1-2515
Related Identifier http://universaldependencies.org/
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-1983
Provenance
Creator Nivre, Joakim; Agić, Željko; Ahrenberg, Lars; Aranzabe, Maria Jesus; Asahara, Masayuki; Atutxa, Aitziber; Ballesteros, Miguel; Bauer, John; Bengoetxea, Kepa; Bhat, Riyaz Ahmad; Bick, Eckhard; Bosco, Cristina; Bouma, Gosse; Bowman, Sam; Candito, Marie; Cebiroğlu Eryiğit, Gülşen; Celano, Giuseppe G. A.; Chalub, Fabricio; Choi, Jinho; Çöltekin, Çağrı; Connor, Miriam; Davidson, Elizabeth; de Marneffe, Marie-Catherine; de Paiva, Valeria; Diaz de Ilarraza, Arantza; Dobrovoljc, Kaja; Dozat, Timothy; Droganova, Kira; Dwivedi, Puneet; Eli, Marhaba; Erjavec, Tomaž; Farkas, Richárd; Foster, Jennifer; Freitas, Cláudia; Gajdošová, Katarína; Galbraith, Daniel; Garcia, Marcos; Ginter, Filip; Goenaga, Iakes; Gojenola, Koldo; Gökırmak, Memduh; Goldberg, Yoav; Gómez Guinovart, Xavier; Gonzáles Saavedra, Berta; Grioni, Matias; Grūzītis, Normunds; Guillaume, Bruno; Habash, Nizar; Hajič, Jan; Hà Mỹ, Linh; Haug, Dag; Hladká, Barbora; Hohle, Petter; Ion, Radu; Irimia, Elena; Johannsen, Anders; Jørgensen, Fredrik; Kaşıkara, Hüner; Kanayama, Hiroshi; Kanerva, Jenna; Kotsyba, Natalia; Krek, Simon; Laippala, Veronika; Lê Hồng, Phương; Lenci, Alessandro; Ljubešić, Nikola; Lyashevskaya, Olga; Lynn, Teresa; Makazhanov, Aibek; Manning, Christopher; Mărănduc, Cătălina; Mareček, David; Martínez Alonso, Héctor; Martins, André; Mašek, Jan; Matsumoto, Yuji; McDonald, Ryan; Missilä, Anna; Mititelu, Verginica; Miyao, Yusuke; Montemagni, Simonetta; More, Amir; Mori, Shunsuke; Moskalevskyi, Bohdan; Muischnek, Kadri; Mustafina, Nina; Müürisep, Kaili; Nguyễn Thị, Lương; Nguyễn Thị Minh, Huyền; Nikolaev, Vitaly; Nurmi, Hanna; Ojala, Stina; Osenova, Petya; Øvrelid, Lilja; Pascual, Elena; Passarotti, Marco; Perez, Cenel-Augusto; Perrier, Guy; Petrov, Slav; Piitulainen, Jussi; Plank, Barbara; Popel, Martin; Pretkalniņa, Lauma; Prokopidis, Prokopis; Puolakainen, Tiina; Pyysalo, Sampo; Rademaker, Alexandre; Ramasamy, Loganathan; Real, Livy; Rituma, Laura; Rosa, Rudolf; Saleh, Shadi; Sanguinetti, Manuela; Saulīte, Baiba; Schuster, Sebastian; Seddah, Djamé; Seeker, Wolfgang; Seraji, Mojgan; Shakurova, Lena; Shen, Mo; Sichinava, Dmitry; Silveira, Natalia; Simi, Maria; Simionescu, Radu; Simkó, Katalin; Šimková, Mária; Simov, Kiril; Smith, Aaron; Suhr, Alane; Sulubacak, Umut; Szántó, Zsolt; Taji, Dima; Tanaka, Takaaki; Tsarfaty, Reut; Tyers, Francis; Uematsu, Sumire; Uria, Larraitz; van Noord, Gertjan; Varga, Viktor; Vincze, Veronika; Washington, Jonathan North; Žabokrtský, Zdeněk; Zeldes, Amir; Zeman, Daniel; Zhu, Hanzhi
Publisher Universal Dependencies Consortium
Publication Year 2017
Rights Licence Universal Dependencies v2.0; https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Greek, Ancient (to 1453); Arabic; Basque; Bulgarian; Croatian; Czech; Danish; Dutch; Flemish; English; Estonian; Finnish; French; German; Gothic; Greek, Modern (1453-); Greek; Hebrew; Hindi; Hungarian; Indonesian; Irish; Italian; Japanese; Latin; Norwegian; Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic; Persian; Farsi; Polish; Portuguese; Romanian; Moldavian; Moldovan; Slovenian; Slovene; Spanish; Castilian; Swedish; Tamil; Catalan; Valencian; Chinese; Galician; Kazakh; Latvian; Russian; Turkish; Coptic; Sanskrit; Saṁskṛta; Slovak; Ukrainian; Uighur; Uyghur; Vietnamese; Belarusian; Korean; Lithuanian; Urdu
Resource Type corpus
Format text/plain; charset=utf-8; application/x-gzip; downloadable_files_count: 4
Discipline Linguistics