Universal Segmentations 1.5 (UniSegments 1.5)

PID

Universal Segmentations (UniSegments) is a collection of lexical resources that captures morphological segmentations, harmonized into a cross-linguistically consistent annotation scheme. The file format consists of simple tab-separated columns, where each entry represents a word and its morphological segmentations. Additionally, the entries include information such as part-of-speech categories and morph types.

The second publicly available version of this collection, UniSegments v1.5, includes 62 harmonized segmentation datasets covering 46 languages from various language families.

Identifier
PID http://hdl.handle.net/11234/1-6130
Related Identifier https://ufal.mff.cuni.cz/universal-segmentations
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-6130
Provenance
Creator John, Vojtěch; Žabokrtský, Zdeněk; Reeves, Benjamin; Ševčíková, Magda; Abdukerim, Haridanmu; Abduwaiti, Abduhalik; Abliz, Abdukerim; Ansari, Ebrahim; Arkhangelskiy, Timofey; Astapenka, Lizaveta; Bafna, Niyati; Batsuren, Khuyagbaatar; Bella, Gábor; Bertinetto, Pier Marco; Celata, Chiara; Deacon, Hélène; Diamantopoulos, Konstantinos; Dohnalová, Šárka; Fedorenko, Alexei; Filko, Matea; Forst, Antonín; Gamba, Federica; Garipov, Timur; Gaustad, Tanja; Giunchiglia, Fausto; Glazkova, Anna; Haghdoost, Hamid; Hathout, Nabil; Khomchenkova, Irina; Khurshudyan, Victoria; Klyucheva, Maria; Kyjánek, Lukáš; Litta, Eleonora Maria; Lyashevskaya, Olga; Macoir, Joël; Mailhot, Hugo; Maosong, Sun; McKellar, Cindy; Medvedeva, Maria; Morozov, Dmitry; Muralikrishna, S.N.; Namer, Fiametta; Nedoluzhko, Anna; Nikravesh, Mahshid; Papáček, Aleš Manuel; Passarotti, Marco; Polyakov, Alexey; Potapov, Mihail; Pruthwik, Mishra; Raftopoulou, Chrysanthi; Rao, Ashwath; Samar, Husain; Sánchez-Gutiérrez, Claudia; Savelev-Galiaminskii, Iurii; Sharma, Dipti Misra; Slavíčková, Eleonora; Stephen, Abishek; Svoboda, Emil; Šojat, Krešimir; Štefanec, Vanja; Talamo, Luigi; Vidra, Jonáš; Vydrin, Arseniy; Wilson, Maximiliano A.; Yang, Liu; Zakirova, Aigul
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2026
Rights Universal Segmentations 1.5 License Terms; https://lindat.mff.cuni.cz/repository/static/licence-unisegs-1.5.html; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Belarusian; Bengali; Bangla; Catalan; Valencian; Czech; German; Greek, Modern (1453-); Greek; English; Esperanto; Persian; Farsi; Finnish; French; Hindi; Croatian; Hungarian; Armenian; Italian; Kannada; Latin; Malayalam; Marathi; Marāṭhī; Moksha; Mongolian; Erzya; Ndebele, South; South Ndebele; Southern Ndebele; Pedi; Sepedi; Northern Sotho; Polish; Portuguese; Russian; Sotho, Southern; Southern Sotho; Spanish; Castilian; Swati; Swedish; Telugu; Tajik; Tswana; Tsonga; Udmurt; Uighur; Uyghur; Ukrainian; Venda; Xhosa; Chinese; Zulu
Resource Type lexicalConceptualResource
Format text/plain; charset=utf-8; application/x-gzip; downloadable_files_count: 1
Discipline Linguistics